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1  Objectives  Accomplishments 

1.1  Objectives 

Our  proposed  objectives  were  twofold: 

Objective  1 

To  develop  a  unified  controller  design  methododology  for  manufacturing  automation  systems 
and  to  demonstrate  the  approach  on  a  manufacturing  process  of  interest  to  DARPA. 

Objective  2 

To  provide  specifications  for  computer-aided-control-engineering  (CAGE)  design  tools  that 
appeal  to  the  needs  and  skill  level  of  process  control  engineers. 

These  objectives  were  aimed  at  the  general  process  control  industry,  in  the  hopes  of  extending 
the  scope  of  the  DARPA  initiative  known  as  IPM  (Intelligent  Processing  of  Materials),  i.e.,  to  merge 
recent  advances  in  process  modeling,  sensor  development,  and  computer-aided-control-engineering. 


1.2  Accomplishments 

The  demonstration  system  selected  was  rapid  thermal  processing  (RTP)  of  semiconductor  wafers. 
This  novel  approach  in  integrated  circuit  manufacturing  demands  fast  tracking  control  laws  that 
achieve  near  uniform  spatial  temperature  distributions.  In  order  to  ensure  the  final  product  quality, 
it  is  essential  to  maintain  a  uniform  temperature  profile  despite  uncertainties  in  both  transient  and 
steady-state  phases  of  the  process.  Hence,  the  high  performance  requirements  for  RTP  make  it  an 
excellent  candidiate  to  meet  both  of  our  overall  objectives.  Another  reason  for  selecting  RTP  is 
because  of  our  physical  proximity  to  Applied  Materials  Research,  Inc.  (AM AT).  As  a  result  we 
were  able  to  test  the  feasibility  of  many  of  our  ideas  using  AMAT  facilities.  ^ 

Specific  accomplishments  are: 

•  Heat  transfer  modeling 

•  Model  reduction 

•  Robust  thermal  control 

•  Feedforward  learning  control 

A  summary  of  our  work  in  each  of  these  areas  is  described  in  the  next  section. 


*The  data  presented  here  and  in  our  related  publications  is  bcised  on  a  generic  RTP  model  and  does  not  use  the 
actual  data  obtained  from  any  RTP  system  developed  by  .A.M.A.T. 
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2  Summary 


Further  details  of  the  work  summarized  in  this  section  can  be  found  in  a  series  of  published  papers 
which  are  included  in  the  appendix  of  this  report. 


2.1  Heat  transfer  modeling 

Predictive  process  models  are  crucial  to  accomplishing  both  path  planning  and  feedback  control 
especially  when  sdome  of  the  key  performance  variables  cannot  be  sensed  in  situ.  In  general, 
writing  relations  with  the  goal  of  completely  describing  the  physics,  micromechanics,  or  material 
science  phenomena  can  produce  different  results  from  writing  relations  with  the  goal  of  using  them 
to  design  robust  paths  and  robust  feedback  controls. 

Thermal  systems  are  ubiquitous  in  material  and  semiconductor  manufacturing  systems.  Ther¬ 
mal  systems  can  be  physically  modeled  using  finite  elements  consisting  of  first  order  systems  of 
ordinary  differential  equations.  Most  of  the  predominant  thermal  affects  can  be  modeled  in  this 
way,  e.g.,  radiation,  conduction,  and  convection.  The  radiation  affects  are  nonlinear  involving  quar- 
tic  temperature  terms.  Radiation  has  a  significantly  larger  effect  in  an  RTP  chamber  than  in  a 
horizontal  belt  furnace  principally  because  the  RTP  chamber  is  much  smaller  and  the  radiation 
effects  diminish  rapidly  with  distance.  As  a  result,  furnace  thermal  models  are  essentially  linear 
whereas  RTP  systems  are  not,  but  can  be  well  represented  by  linear  models  over  a  reasonably  large 
region  centered  about  each  critical  operating  point. 

Finite  element  thermal  models  can  have  thousands  of  states  and  even  many  more  parameters, 
e.g.,  thermal  coefficients.  From  the  control  perspective,  the  structure  and  accuracy  of  the  model 
is  most  important.  In  addition,  the  large  number  of  variables  requires  numerical  software  which  is 
efficient  and  fast.  The  major  goal  in  developing  control  software  for  these  systems  is  to  allow  the  user 
to  establish  variou  design  tradeoffs,  e.g.,  the  tradeoff  between  and  uncertainty  and  performance. 

We  have  developed  a  systematic  procedure  of  generating  nonlinear  finite-element  heat  transfer 
models  based  on  a  set  of  node  temperatures  and  associated  heat  transfer  characteristics  between 
the  nodes.  A  circuit-theoretic  approach  is  adopted  due  to  the  ease  of  constructing,  modifying  and 
verifying  the  models.  With  increasing  resolution  and  hence  large  number  of  nodes  and  branches, 
the  resulting  interconnections  can  easily  become  cumbersome  and  even  impossible  to  describe  and 
manipulate  without  the  aid  of  computers.  Hence,  a  systematic  approach  to  modeling  is  crucial 
to  allow  the  engineer  to  truly  concentrate  on  model  validation  rather  than  error-prone,  laborious 
technicalities  in  generating  the  model. 

For  heat  transfer  models  based  on  conduction,  convection  and  radiation  characteristics  of  com¬ 
ponents,  the  resulting  nonlinear  dynamic  system  equations  can  easily  be  described  in  terms  of  node 
equations.  This  approach  relies  on  three  concepts:  graph,  branch  characteristics  and  a  conservation 
law.  Specifically,  a  graph  is  a  collection  of  nodes  and  directed  branches  connecting  these  nodes.  It 
is  a  means  of  formalizing  the  interactions  based  on  the  physical  properties  of  the  subsystems  and 
incorporating  prior  knowledge/observations  about  the  model. 

In  our  work  to  date  we  have  developed  software  which  accepts  as  inputs  the  parameters  associ¬ 
ated  with  finite  element  thermal  models.  An  additional  feature  of  the  software  is  that  uncertainties 
in  the  model  are  directly  associated  with  uncertain  physical  system  parameters,  e.g.,  thermal  coef¬ 
ficients.  The  uncertain  parameters  can  be  either  identified  or  used  in  a  robust  control  design. 
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2,2  Robust  thermal  control 

Rapid  thermal  processing  of  semiconductor  wafers  demands  fast  tracking  control  laws  that  achieve 
near  uniform  spatial  temperature  distributions.  In  order  to  ensure  the  final  product  quality,  it 
is  essential  to  maintain  a  uniform  temperature  profile  despite  uncertainties  in  both  transient  and 
steady-state  phases  of  the  process.  We  have  developed  solutions  and  associated  software  for  the 
design  of  robust  thermal  control  and  optimization.  For  a  given  operating  trajectory,  the  control 
problem  is  posed  as  determining  a  feedforward/feedback  controller  that  minimizes  the  worst-case 
peak  deviation  of  the  performance  variables  subject  to  a  particular  class  of  bounded  disturbances 
and  parameter  variations.  Since  the  solution  to  this  nonlinear  problem  is  not  known,  a  sequence  of 
approximations  in  terms  of  the  small-signal  equivalents  are  used  to  pose  linear  control  problems.  A 
complete  solution  to  the  associated  linear  control  design  problem  is  derived.  Considering  problem 
sizes  of  interest,  efficient  computational  solution  methods  were  investigated  and  prototype  tools 
developed  to  simplify  repetitive  performance/robustness  tradeoflf  studies,  as  well  as  determining 
sensor/actuator  locations  and  operating  points. 

Specifically,  (1)  we  have  implemented  efficient  methods  of  calculating  optimal  controller  gains 
and  performance/robustness  tradeoffs;  (2)  we  have  developed  a  graphical  user-interface  that  sim- 
phfies  the  design  of  these  controllers,  in  such  a  way  that  the  theoretical  details  are  transparent  to 
the  user. 

The  results  have  taken  the  trial/error  studies  performed  on  the  physical  system  to  a  simulation 
level  study  where  worst-case  performance  can  be  quantified.  Apart  from  the  obvious  advantages  of 
having  a  representative  oflF-line  model  of  the  real  system,  we  have  brought  a  systematic  approach 
to  quantify  the  following  factors: 

•  best  nominal  uniformity  at  steady-state, 

•  best  worst-case  uniformity  at  steady-state,  and 

•  best  nominal  transient  uniformity  during  ramp-ups  between  steady-states. 

This  systematic  approach  of  associating  performance  limitations  with  a  particular  chamber  model 
has  allowed  us  to  bring  answers  to  crucial  questions  involving  physical  chamber-design  and  impli¬ 
cations  on  achievable  closed-loop  performance. 

The  results  are  dewcribed  in  detail  in  the  following  papers  which  are  included  in  the  appendix: 

1.  Robert  L.  Kosut  and  M.  Giintekin  Kabuli,  "Robust  control  of  thermal  processes:  Static 
performance,"  Proceedings  of  the  2nd  International  Rapid  Thermal  Processing  Conference. 
pp.  296-297,  Monterey,  California.  September  1994. 

2.  M.  Giintekin  Kabuli,  Robert  L.  Kosut  and  Stephen  P.  Boyd,  ‘improving  static  performance 
robustness  of  thermal  processes,"  Proceedings  of  the  33rd  IEEE  Conference  on  Decision  and 
Control^  pp.  62-66,  Lake  Buena  Vista,  Florida,  December  1994. 

3.  A.  Emami-Naeini,  M.  G.  Kabuli  and  R.  L.  Kosut,  “Finite-Time  Tracking  with  Actuator 
Saturation:  Application  to  RTP  Temperature  Trajectory  Following,”  Proceedings  of  the  33rd 
IEEE  Conference  on  Decision  and  Control,  pp.  73-78,  Lake  Buena  Vista,  Florida,  December 
1994. 
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4.  Robert  L.  Kosut  and  M.  Giintekin  Kabuli,  "On  Actuator-Sensor  Selection  in  Thermal  Pro¬ 
cesses,'’  Proceedings  of  the  34th  IEEE  Conference  on  Decision  and  Control.  New  Orleans. 
Louisiana,  December  13-15.  1995. 

5.  Robert  L.  Kosut  and  M.  Giintekin  Kabuli,  "On  operating  point  sensitivity  of  thermal  pro¬ 
cesses.”  Proc.  1996  Triennial  IFAC  World  Congress,  San  Francisco,  California  USA.  July 
1-5  1996. 

2.3  Model  Reduction 

We  investigated  the  use  of  the  proper  orthogonal  decomposition  (POD)  method  for  model  reduction 
which  was  used  originally  for  approximating  turbulent  phenomena.  There  seems  to  be  no  end  to 
the  number  of  times  it  has  been  re-discovered,  e.g.,  it  appears  in  work  on  weather  prediction,  and  in 
pattern  recognition  where  it  is  known  as  the  Karhunen-Loeve  expansion.  More  recently  it  has  been 
used  to  reduce  the  dimensionality  of  the  ODEs  obtained  from  finite  element  analysis.  In  summary, 
the  POD  can  be  used  iteratively  to  find  a  parsimonious  set  of  basis  functions  for  the  finite  element 
analysis. 

The  basic  premise  of  the  POD  is  the  orthogonal  decomposition  of  the  spatial  covariance  of 
the  instantaneous  spatial  solution  profiles  of  a  PDE  -  “snapshots”  as  they  are  often  referred.  In 
the  context  of  time-dependent  and  in  particular  turbulent  hydrodynamics,  the  POD  method  was 
used  on  spatial  velocity  correlations  to  identify  coherent  spatial  structures.  A  combination  of  this 
hierarchy  of  structures  with  a  Galerkin  weighted  residual  discretization  of  the  fundamental  model 
equations  provides  a  spatially  and  temporally  accurate  model  of  the  PDE  dynamics,  provided  that 
a  sufficient  number  of  modes  has  been  retained.  There  is  increasing  evidence,  through  the  study 
of  several  model  problems,  that  this  methodology  can  be  a  crucial  engineering  algorithmic  tool  for 
the  reduction,  analysis,  design  and  control  of  distributed  systems,  beyond  the  context  of  fluid  flow. 

The  POD  method  is  quite  general  and  essentially  relies  on  the  singular  value  decomposition.  It 
works  directly  on  the  state,  and  does  not  distinguish  outputs  as  do  balanced/truncation  methods 
involving  Hankel  singular  values.  Our  experience  with  POD  was  successful  for  reducing  high  order 
finite  element  models  of  rapid  thermal  processing  systems  as  described  in  the  following  paper  which 
is  included  in  the  appendix: 


H.  Aling,  R.L.  Kosut,  A.  Emami-Naeini.  and  J.  L.  Ebert.  “Nonlinear  model  reduction  with  ap¬ 
plication  to  rapid  thermal  processing.”  Proc.  35th  IEEE  CDC.  pp.  4305-4310.  Kobe.  Japan. 
Dec.  1996. 


2.4  Feedforward  learning  control 

For  RTP,  the  ability  to  quickly  manipulate  wafer  temperature  according  to  the  commanded  temper¬ 
ature  profile  is  crucial.  Sensor-based  feedback  can  certainly  improve  the  RTP  reactor's  temperature 
following  capability,  maintain  tight  temperature  control  at  steady  state,  and  reduce  the  efi'ects  due 
to  equipment  variations.  However,  the  bandwidth  of  feedback  control  must  be  balanced  with  sta¬ 
bility  considerations  which  are  often  limited  by  the  process  characteristics  such  as  time  delay. 

Feedforward  control,  on  the  other  hand,  can  complement  feedback  control  performance  by 
promoting  non-delay  and  anticipatory  actions  which,  when  properly  designed,  can  lead  to  superior 
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tracking  or  disturbance  rejection.  Combining  feedback  and  feddforward  control  should  lead  to  a 
robust,  stable  and  yet  agile  temperature  control  system. 

Traditional  feedforward  design  is  usually  based  on  analytical  methods  that  require  fairly  accu¬ 
rate  modeling  of  the  process  and  the  FB  control  loop.  Such  knowledge  is  often  not  available  or 
is  subject  to  change  overtime.  To  overcome  this  problem  we  have  developed  a  method  for  using 
run-to-run  data  to  modify  a  feedforward  control  signal.  The  result,  which  is  principally  useful  for 
repetitive  run-to-run  resipes,  is  a  learning  algorithm.  The  approach  has  its  roots  in  disturbance 
rejection  of  acoustic  emmissions  where  the  disturbance  is  uaually  measured  with  a  microphone. 
Our  application  is  in  the  context  of  feedback  control.  The  specific  approach  is  designed  for  ease 
of  tuning,  as  it  can  learn  from  the  past  experiences.  Many  manufacturing  tasks  are  repetitive 
task-oriented  and  are  potential  applications. 

The  feedforward  learning  control  approach  is  described  in  detail  in  the  following  publications 
which  are  included  in  the  appendix: 

1.  K.M.  Tao,  R.L.  Kosut  and  G.  Aral,  “Learning  Feedforward  Control,”  Proc.  American  Control 
Conference,  Baltimore,  MD,  June  -  July  1994,  pp.  2575  -  2579. 

2.  K.M.  Tao,  G.  Aral,  R.L.  Kosut  and  M.  Ekblad,  “Feedforward  Learning  Methods  in  RTF 
Temperature  Control,”  Proc.  2nd.  Int.  Rapid  Thermal  Processing  Conf.,  Monterey,  CA, 
Sept.  1994,  pp.  278  -  282. 

3.  K.M.  Tao,  R.L.  Kosut,  M.  Ekblad  and  G.  Aral,  “Feedforward  Learning  Applied  to  RTF  of 
Semiconductor  Wafers,”  Proc.  33rd  IEEE  Conf.  Decision  and  Control,  pp.  67-72,  Lake 
Buena  Vista,  FL,  December  1994. 

4.  K.M.  Tao,  R.L.  Kosut  and  M.  Ekblad,  “Feedforward  Learning  -  Nonlinear  Frocesses  and 
Adaptation,”  Proc.  33rd  IEEE  Conf.  Decision  and  Control,  pp.  1060-1065,  Lake  Buena 
Vista,  FL,  December  1994. 


3  Transitions 

The  feasibility  of  many  of  the  techniques  developed  under  this  contract  for  modeling,  control  design, 
and  optimization  were  successfully  tested  on  one  or  more  of  Applied  Materials  RTF  chambers. 
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A.l  Robust  control  of  thermal  processes:  Static  performance 
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In  this  paper  the  steady-state  sensitivity  of  static  noDlinear  heal  trans¬ 
fer  models  subject  to  static  fccdforward/fcedback  control  laws  is  consid¬ 
ered.  For  a  given  operating  point,  the  desired  control  problem  is  posed 
as  determining  a  feedforward /feedback  static  controller  that  rrunirruics 
the  worst-case  peak  deviation  of  the  performance  vaxiabics  about  a 
nominal  point  subject  to  a  particular  class  of  bounded  disturbances 
and  parameter  variations.  Since  the  solution  to  this  nonlinear  problem 
is  not  known,  a  sequence  of  approximations  in  terms  of  the  imaJl-signal 
equivalents  arc  used  to  pose  static  linear  control  problems.  A  com¬ 
plete  solution  to  the  associated  static  linear  control  design  problem  is 
derived.  Considering  problem  sizes  of  interest,  efficient  computational 
solution  methods  are  investigated  and  prototype  tools  are  developed  to 
simplify  repetitive  performance /robustness  tradeoff  studies,  as  well  as 
determining  sensor/actuator  locations  and  operating  points. 


Introduction 

Rapid  thermal  processing  of  semiconductor  wafers  demands 
fast  tracking  control  laws  that  achieve  near  uniform  spatial  tem¬ 
perature  distributions.  In  order  to  ensure  the  final  product  qual- 
ity,  it  is  essential  to  maintain  a  uniform  temperature  profile  de¬ 
spite  uncertainties  in  both  transient  and  steady-statc  phases  of 
the  process. 

In  this  paper  we  focus  exclusively  on  the  steady-state  sensi¬ 
tivity  of  static  nonlinear  heat  transfer  models  subject  to  static 
fccdforward/fcedback  control  laws.  The  approach  relics  on  a 
static  nonlinear  heat  transfer  model,  typically  obtained  by  form¬ 
ing  a  mesh  of  branches  that  model  conduction,  convection  and 
between  the  nodes  of  the  mesh.  A  systematic  modeling 
approach  based  on  the  analysis  of  large  scale  nonlinear  resistive 
networks  is  applied  to  obtain  the  equations  that  determine  the 
operating  points  in  terms  of  input  and  disturbance  biases.  For 
a  more  detailed  treatment  of  the  associated  concepts  utilized, 
sec  c.g.,  [l]  for  the  drcuit-theorctic  approach  and  (2)  for  heat 
transfer  topics.  For  a  given  operating  point,  the  desired  control 
problem  is  posed  as  determining  a  feedforward /feed back  static 
controller  that  minimizes  the  worst-case  peak  deviation  of  the 
performance  variables  about  a  nominal  point  subject  to  a  par¬ 
ticular  class  of  bounded  disturbances  and  p2U’ametcr  variations. 
Since  the  solution  to  this  nonlinear  problem  is  not  known,  a 
of  approximations  in  terms  of  the  small-signal  equiva¬ 
lents  are  used  to  pose  static  linear  control  problems. 

Problem  Description 

Consider  the  feedback  system  shown  in  Figure  1  ,  where 

•  6  IR*^  arc  the  exogenous  inputs  consisting  of  both  static 
disturbances  and  uncertain  physical  parameters, 

•  z  6  R*'*  are  the  performance  variables  (not  necessarily 
those  that  arc  sensed)  such  as  critical  temperatures  and 
lamp  (actuator)  power  levels, 

•  y  €  IR*^  arc  the  sensed  variables,  c.g.,  temperature  read¬ 
ings  from  various  sensors, 

•  u  €  IR*^  axe  the  actuator  inputs,  c.g.,  lamp  powers. 

The  static  linear  plant  P  6  is  obtained  by 

linearization  about  a  particular  equilibrium  or  operating  point. 
The  actuator  input  consists  of  a  static  feedforward  control 
UK  €  and  a  static  feedback  Ky  with  K  6  . 
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Figure  1:  Static  feedback  system 
It  is  convenient  to  partition  P  as  follows: 


Let  xtfQ  €  R  and  zq  €  R*'*  denote,  respectively,  nominal 
values  of  disturbance  and  performance  variables.  Let  A,,  and 
A,  be  nonsingular  diagonal  scaling  matrices  such  that 

tx;  =  uio  +  and  z  =  zo  +  A.ij,  . 

The  scaHng  matrices  arc  chosen  to  reflect  the  relative  size  of  un- 
certainty  in  the  w-variablcs  and  the  relative  size  of  performance 
tolerance  in  the  z-variablcs.  The  77-variables  reflect  uncertainty 
(  \MU  £  A,,  )  and  performance  (  ||i7.||„  <  A.  )  .  The  norm 
II  •  lU  1*  defined  as  =  max,  |x,|  .  For  a  given  A,,  and  A.  . 
a  design  (if,  U/f)  is  said  to  be 

(A„.  A.)-feasible  iff  <  X.  for  aU  ||i7.||„  <  . 

The  specific  design  problem  under  coneideration  is  the  fol- 

lowing: 

•  For  a  given  P,  A^,  uiq,  z©  and  nonsingular  A,,,  A^  ,  (1) 

determine  arg  min  max  [It;.!! 

Due  to  the  two  free  parameters  /f  and  uk  ,  one  can  pose  four 
possible  cases  from  problem  (1)  with  appropriate  restrictions  on 
the  parameters,  namely: 

1.  open-loop  (u  =  0), 

2.  feedforward  only  (u  =  uk), 

3.  feedback  only  (u  =:  ify)  and 

4.  feed  forward /feedback  (u  =  +  /fy)  . 

Typical  design  specifications  involve  the  following: 

1.  feasibility:  determine  (if  possible)  a  (A..  A.)-fcasiblc  de¬ 
sign,  or 

2.  robustness:  for  a  fixed  A.,  maximize  A„  among 
(>„,  A,)— feasible  designs,  or 

3.  performance:  for  a  fixed  A^,  rrunirruze  A^  amonz 
(Aw,  A,)  — feasible  designs. 

The  feasibility,  robustness  and  performance  design  problems 
slated  above  can  all  be  expressed  in  terms  of  problem  (1)  by 
considering  the  tradeoff  curve  denoted  by  the  graph 


The  graph  m  (2)  denotes  the  boundary  of  feasible  and  infea¬ 
sible  designs  on  the  { A, ) - plajjc.  In  general,  the  graph  that 
partitions  the  feasible  and  infeasible  regions  is  neither  convex 
nor  concave.  In  the  following  section,  we  describe  a  solution 
method  for  problem  (1)  and  hence  a  method  for  deriving  the 
graph  in  (2)  . 

Solution 

The  general  form  of  the  solution  to  obtain  the  graph  in 

(2)  is  given  in  the  Appendix.  The  results  can  be  dcriv^  us¬ 
ing  the  static  form  of  the  Youla  parametri ration  of  all  feasible 
cootrolicrs(5|  .  The  details  will  be  reported  elsewhere. 

For  our  purposes  here,  all  possible  four  cases  reduce  to  a 
problem  of  the  form  tmn  ||Ti  +TiXrj||,.oo  .  where  ||  •  denotes 

the  matrix  norm  induced  by  the  vector  norm  |1  •  H**  .  Efficient 
solution  methods  based  on  interior  point  methods  arc  used  to 
solve  the  associated  linear  program.  In  the  process,  (n*(ny  +1)) 
control  variables  axe  sought  by  introducing  (n,(n^  +  1)  +  1) 
more  slack  variables.  Typically  n.rv*  >  ;  hence,  special 

structure  and/or  conditioning  has  to  be  utiliz^  in  order  not  to 
be  burdened  by  the  increase  in  the  parameter  dimension.  The 
results  in  the  following  section  arc  obtained  by  prototype  opti¬ 
misation  tools  developed  in  MATRJXx  based  on  the  methods  in 
[31  »nd  (4)  . 

Design  Example 

Consider  the  mesh  in  Figure  2.  It  represents  a  linear  resis¬ 
tive  network  consisting  of  seven  nodes  and  nine  branches.  All 
branch  conductances  are  taken  as  unity;  node  0  denotes  the  da¬ 
tum  node  (ambient  temperature).  Input  and  disturbance  fluxes 
are  denotes  by  u  and  tn  ,  respectively.  Tbe  measured  output 
is  denoted  by  y  .  Conservation  equations  written  at  each  and 
every  node  except  the  datum  node  yield  y  =  .  For 

example,  at  node  5  ,  we  have 

w2  =  (rs-ra)  +  (Ts-r4)  +  (rs-re)  , 

where  T  denotes  the  node  temperatures;  conductances  arc  unity 
for  simplicity. 

The  description  of  the  mesh  (consisting  of  conduction,  con¬ 
vection  and  radiation  terms)  and  the  derivation  of  the  associated 
linearixed  model  at  the  steady-state  are  all  automated.  For  this 
example,  the  regulated  variables  are  chosen  as  the  six  node  tem¬ 
peratures.  Hence,  n^  =  2,n,,=2,n,  =  6  and  =  2  . 
Thus  P  6  ,  K  €  and  u/f  €  IR*  -  We  seek  at  most 

nv(ny  + 1)  =  6  control  variables  and  introduce  n,(n^  + 1)  + 1  =  7 
slack  variables. 

The  tradeoff  curve  in  (2)  it  derived  for  five  cases  (see  Fig¬ 
ure  3)  .  One  extra  design  approach  is  introduced  to  illustrate 
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Figure  3:  Tradeoff  curves  io  (2)  for  the  five  cases: 


a  -  open-loop 
b  ~  feedforward  only 

c  -  one  at  a  time  optimization;  first  feedforward 
and  then  feedback 
d  -  feedback  only 

e  -  simultaneous  optimization:  feedforward  and 
feedback 


Conclusions 

Using  the  complete  solution  to  the  static  linear  control  de¬ 
sign  problem,  performance  limitations  arc  derived.  Repetitive 
performancc/robustness  tradeoff  studies  can  be  performed  eas¬ 
ily  for  different  sensor/actuator  locations  and  operating  points. 
The  tools  are  specifically  geared  towards  large  scale  resistive 
network  problems.  The  problem  solved  here  is  used  to  illustrate 
tbe  tool  and  methodology. 
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Appendix 

Given  P,  u;o,  A,„,  zo  nonsingular  A,  ,  it  can  be 
shown  that: 


(K.uk)  =  iigmn  mix  Htj.H. 


and  dct(/-P,^A')  #  0 


if  and  only  if 


=  (QU  +  Pv^Q)-'  + 


Figure  2:  Simple  mesh 


where 


Ihit  the  feedforwird  and  (eedbick  designs  ire  not  decoupled.  In 
other  words,  minimizing  over  ujf  for  /f  =  0  ind  then  fixing  the 
opttmil  u#f  value  and  then  optimizing  over  K  is  not  equivalent 
to  the  simultaneous  minimization  over  uk  and  K  . 


(Q.-c) 

MQ) 

b{Q,UQ) 
det(/  +  P^Q) 


=  MgminlK  KQ.uo)  !l|i.„ 

=  +  P.,QP,.)A. 

=  +  ZO  +  P..UQ) 

/  0 


A. 2  Improving  static  performance  robustness  of  thermal  processes 

•  M.  Giintekin  Kabuli,  Robert  L.  Kosut  and  Stephen  P.  Boyd,  "Improving  static  performance 
robustness  of  thermal  processes,”  Proceedings  of  the  33rd  IEEE  Conference  on  Decision  and 
Control,  pp.  62-66,  Lake  Buena  Vista,  Florida,  December  1994. 
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Abstract 

A  static  (steady-state)  robust  control  design  prob¬ 
lem  is  considered  using  a  nonlinear  model  of  a  ther¬ 
mal  system.  For  a  given  operating  point,  the  control 
problem  is  to  determine  a  feedforward/fecdback  static 
controller  that  minimizes  the  worst-case  static  peak 
performance  deviation  from  nominal  in  the  presence 
of  bounded  disturbances  and  parameter  variations.  It 
is  desired  to  obtain  the  tradeoff  between  the  size  of 
the  worst-case  deviation  and  the  size  of  the  uncer¬ 
tainty  set.  A  complete  solution  is  derived  for  the 
static  linear  control  design  problem  obtained  from 
linearization  about  selected  operating  points.  Effi¬ 
cient  computational  tools  are  developed  to  rapidly 
analyze  numerous  operating  points  and  control  con¬ 
figurations. 

1  Introduction 

Rapid  thermal  processing  (RTF)  systems  demand 
fast  tracking  control  laws  that  achieve  near  uniform 
spatial  temperature  distributions  across  the  target, 
c.g.,  a  semiconductor  wafer,  during  both  transient 
and  steady-state  phases  of  the  process. 

In  this  paper  we  only  address  the  static  (steady- 
state)  problem  using  static  feedforward/fecdback  con¬ 
trol  laws.  The  approach  relics  on  a  static  nonlinear 
heat  transfer  model  which  includes  parameter  uncer¬ 
tainty.  The  form  of  the  model  found  to  be  very  conve¬ 
nient  for  robust  control  design  is  obtained  by  forming 
a  mesh  of  branches  that  model  conduction,  convec¬ 
tion  and  radiation  between  the  nodes  of  the  mesh. 
A  systematic  modeling  approach  based  on  the  analy¬ 
sis  of  large  scale  nonlinear  resistive  networks  can  then 
be  applied  to  obtain  the  equations  that  determine  the 
operating  points  in  terms  of  input  and  disturbance  bi¬ 
ases.  This  model  structure  is  generic,  since  all  ther¬ 
mal  system  models  can  be  put  in  this  form  [1,  2]  . 

For  a  given  operating  point,  the  control  problem  is 

'Research  supported  by  ARPA  under  AFOSR  contract 
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posed  as  designing  a  feedforward/fecdback  static  con¬ 
troller  that  minimizes  the  worst-case  peak  deviation 
of  the  performance  variables  from  a  nominal  point 
when  subjected  to  bounded  disturbances  and  param¬ 
eter  variations.  Since  the  solution  to  this  nonlinear 
problem  is  not  known,  a  sequence  of  approximations 
in  terms  of  the  small-signal  equivalents  arc  used  to 
pose  static  linear  control  problems.  We  derive  a  com¬ 
plete  solution  to  the  associated  static  linear  control 
design  problem.  Considering  problem  sizes  of  inter¬ 
est,  (e.g.,  20  actuators,  20  sensors,  100  regulated  vari¬ 
ables,  100  exogenous  disturbances)  ,  efficient  compu¬ 
tational  solution  methods  are  investigated  and  pro¬ 
totype  tools  arc  developed  to  simplify  comparative 
design  studies  resulting  from  different  choices  of  op¬ 
erating  points,  actuators,  sensors  and  control  laws 
(feedforward  and/or  feedback). 

The  paper  is  organized  as  follows:  in  Section  2  wc 
pose  and  solve  the  static  linear  sensitivity  problem, 
i.e.,  parameter  uncertainties  arc  included  as  addi¬ 
tional  exogenous  input  perturbations.  The  tools  re¬ 
quired  here  arc  also  needed  for  the  robustness  prob¬ 
lem.  An  example  of  the  sensitivity  tradeoffs  using 
the  thermal  mesh  model  is  given  in  Section  3.  Ro¬ 
bustness  results  for  real  parametric  uncertainties  arc 
given  in  Section  4.  An  example,  using  the  developed 
tools,  is  given  in  Section  5.  To  conserve  space,  only 
a  very  brief  discussion  of  the  computational  issues 
and  methods  is  provided.  Further  details  on  the  op>- 
timization  methods  and  proofs  can  be  obtained  from 
the  authors. 

2  Sensitivity  Tradeoff 

2.1  Problem  Description 

Consider  the  feedback  interconnection  shown  in  Fig¬ 
ure  1  ,  where  u;,  2,  u  and  y  denote  the  exogenous 
inputs,  controlled  outputs,  actuator  inputs  and  mea¬ 
sured  outputs,  respectively;  P  €  ‘ 

denotes  a  static  linear  plant,  K  €  denotes 

a  static  linear  feedback  controller,  and  u/f  6  IR”' 
denotes  the  static  feedforward  control. 
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Figure  1:  Static  feedback  system 

To  address  the  sensitivity  to  parameter  variation,  the 
exogenous  input  u;  includes  disturbances  as  well  as 
parameter  perturbations  from  nominal.  As  such  it  is 
convenient  to  explicitly  account  for  the  nominal  as 
well  as  deviations.  Hence,  let  the  normalized  exoge¬ 
nous  input  be  given  by 

Hu;  = 

where  wq  G  is  the  nominal  value  and  is  a 

diagonal  scaling  matrix.  A  normalized  output  r?,  is 
defined  accordingly: 

m  =  ^7^(2  -  2o)  . 

Motivated  by  the  temperature  uniformity  require¬ 
ments  of  RTF  problems,  we  arc  naturally  led  to  con¬ 
sider  the  “infinity**  norm  as  the  appropriate  measure 
of  signal  size.  Thus,  for  x  G  IR^  ,  the  infinity  norm  is 
defined  as  lla:||oo  =  niaxi<jk<n  |*jk|  .  Recall  also  that 
the  induced  matrix  norm  is  the  “max- row-sum**,  i.c., 


achieve  the  requested  performance.  Conversely,  all 
(1,  1) “feasible  designs  correspond  to  points  on  the 
segment  between  designs  A  and  B  in  Figure  2  .  Note 
that  A  corresponds  to  a  performance  design  and  C 
corresponds  to  a  robust  design,  i.e.,  design  C  allows 
a  much  larger  uncertainty  for  the  requested  specifi¬ 
cation  llry^lloo  <  A,  =  1,  In  general,  the  graph  that 
partitions  the  feasible  and  infeasible  regions  is  neither 
convex  nor  concave. 


X 


Figure  2:  Graph  in  (1)  denoting  the  boundary  of  the 
feasible  region  in  the  (A,^,A,)-plane;  shaded  region  is 
infeasible. 

Due  to  the  two  free  design  parameters  K  and  u/c  , 
four  possible  problems  can  be  posed,  namely: 


Mlkoo  =  max  \\Az\\oo  =  m«  V|aiy|  . 

We  can  now  state  the  fundamental  design  problem: 


•  Optimal  Design  Tradeoff:  For  a  given  P, 
zo  ,  Av;,  and  A^,  find  the  optim2J  tradeoff  be¬ 
tween  disturbance  size  11t?^||oo  and  performance 
tolerance  |lT7^||ooi  i-c.,  determine  the  graph: 


{{K  ,  K)  1  A,,  >  0} 


A, 


min  max 


Ih.lL 


(1) 


The  graph  in  (1)  of  A,  vs.  A,^  gives  the  minimum  rel¬ 
ative  change  (A,)  uniformly  in  all  performance  vari¬ 
ables  (2)  for  a  relative  uniform  change  (A^)  in  all 
exogenous  input  variables  (u^).  Hence,  the  graph  de¬ 
notes  the  boundary  between  feasible  and  infeasible 
designs  on  the  (A,„,A,)“  plane.  A  typical  tradeoff 
curve  -  the  graph  described  in  (1)  -  is  shown  by 
the  solid  line  in  Figure  2.  The  shaded  region  be¬ 
low  this  tradeoff  curve  is  infeasible,  i.e.,  there  exists 
no  combination  of  feedforward  or  feedback  which  can 


•  open- loop  (u  =  0), 

•  feedforward  only  (u  =  ti/c), 

•  feedback  only  (u  =  ify), 

•  feedforweird /feed back  (u  =  u/c  +  ify)  . 

Typical  design  specifications  involve  the  following: 

•  feasibility:  determine  (if  possible)  a 
(Av ,  A,)— feasible  design 

•  robustness:  for  a  fixed  A,,  maximize  A,^  among 
(Attf ,  A*  )-"feasible  designs 

•  performance:  for  a  fixed  A,^,  minimize  A,  among 
(A^y  ,  Ag  )— feasible  designs. 

The  feasibility,  robustness  and  performance  design 
problems  stated  above  can  all  be  expressed  in  terms 
of  the  tradeoff  curve  denoted  by  the  graph  in  (1)  .  In 
fact,  it  can  be  shown  that  the  four  possible  choices  of 
control  (u  =  0 , 11  =  Ky  ,  u  =  ,  u  =  -I-  iiCy  )  re¬ 

sult  in  special  cases  of  finding  a  solution  to  a  problem 
of  a  max-row-sum  norm  minimization  of  the  form: 

™n||Ti -hTaXTalkoo  .  (2^ 


where  the  T's  and  X  arc  defined  in  accordance  with 
the  four  cases. 

2.2  Positive  Definite  Programming 

To  efficiently  solve  the  above  majc-row-sum  problem 
we  utilize  the  primal-dual  potential  reduction  meth¬ 
ods  described  in  [4].  The  computational  tools  we  have 
developed  resolve  the  following  difficulties:  1)  Initial¬ 
ization  of  primal  and  dual  variables  ;  2)  Efficient  ap¬ 
proximate  solutions  to  the  (huge)  least-squares  prob¬ 
lems  to  determine  the  analytic  center  ;  3)  Per¬ 

turbation  of  the  updated  dual  parameters.  To  fur¬ 
ther  explain  these  very  important  but  esoteric  is¬ 
sues  is  outside  the  scope  for  this  paper.  Interested 
readers  can  request  details  from  the  authors.  For 
the  intended  applications  (rixn^)  '>  1  • 

The  particular  solution  approach  reduces  the  orig¬ 
inal  (n^n^  +  +  Tit^ny-|-nv  +  1)- unknown  least- 

squares  problem  to  a  (n^n^  -h  Ti^)-unknown  least- 
squares  problem.  Hence  the  computational  complex¬ 
ity  is  determined  by  the  control  variables:  (n^riy)  for 
feedback  and  ruu  for  feedforward. 

3  Example:  Sensitivity  Trade¬ 
off 

The  mesh  in  Figure  3  represents  a  resistive  network 
consisting  of  seven  nodes  and  nine  branches.  The 
mesh  describes  conduction,  convection  and  radiation 
effects  as  non-linear  resistive  elements.  (The  authors 
have  used  an  Xmaik  script  to  automate  the  mesh  gen¬ 
eration  and  the  associated  linearized  model  at  the 
steady-state.)  Following  standard  node  analysis  re¬ 
sults  for  linear  resistive  networks  (see  e.g.  [1])  ,  the 
steady-state  hcat-fiux  conservation  equations  arising 
from  application  of  the  Kirchhoff  Current  Law  at  each 
and  every  node  (except  the  reference  (datum)  node) 
result  in: 

0  =  -f  i4uU -h  y  =  Cx  .  (3) 

The  node  variables  x  correspond  to  node  temperature 
minus  the  ambient  temperature,  u  denotes  the  con¬ 
trol  input  fluxes,  and  w  the  disturbance  fluxes.  The 
measured  node  temperatures  are  denoted  by  y  .  The 
matrix  [Ac  A^^,]  is  defined  by  the  incidence  ma¬ 
trix  that  describes  the  interconnection  of  branches; 
its  entries  are  0*s,  Ts  or  —  Ts  .  The  matrix  G  is  diag¬ 
onal  consisting  of  nominal  branch  conductances.  For 
this  example,  the  regulated  variables  are  chosen  as 
the  six  node  temperatures.  Hence,  =  2  ,  =  2  , 

n,  =  6  and  71^  =  2  .  Thus  P  €  IR®^^  ,  K  G  IR^''^ 
and  u/f  G  IR^  .  We  seek  at  most  rKi(n^  -I-  1)  =  6 
control  variables  and  introduce  n,(n^  -h  1)  +  1  =  19 
slack  variables.  Following  the  notation  in  Section  2  , 
let  u;o,i  =  1, 1  =  1,2  ;  Au;  =  /  ;  zo,i  =  3,  i  =  1, . . . ,  6  ; 
A,  -  /  - 


Figure  3:  Sample  mesh 


The  tradeoff  curve  in  (1)  is  derived  for  five  cases  (sec 
Figure  4)  .  One  extra  design  approach  is  introduced 
to  illustrate  that  the  feedforward  and  feedback  de¬ 
signs  are  not  decoupled.  In  other  words,  minimizing 
over  u/c  for  if  =  0  and  then  fixing  the  optimal  u/f 
value  and  then  optimizing  over  K  is  not  equivalent  to 
the  simultaneous  minimization  over  and  K 


lombda  w 


Figure  4:  Tradeoff  curves  in  (1)  for  the  five  cases: 
a  -  open-loop 
b  -  feedforward  only 

c  “  one  at  a  time  optimization;  first  feedforward  and 
then  feedback 
d  -  feedback  only 

e  ~  simultaneous  optimization:  feedforward  and 
feedback 


IE 


4  Robustness  Analysis 

The  control  design  problem  formulated  in  Section  2  is 
based  on  a  nominal  plant  model  P  :  (ty,  u)  •— *  (^,y)  , 
where  tx;  includes  perturbations  in  uncertain  param¬ 
eters.  In  this  section,  we  consider  the  performance 
of  the  nominal  control  u  —  u/f  +  Ky  subject  to  per¬ 
turbed  plant  models  as  shown  in  Figure  5  ,  where  w 
now  includes  only  exogenous  disturbances. 


Figure  5:  Perturbed  plant  model 

Although  we  still  consider  linearized  plant  models, 
the  uncertainty  is  maintained  in  its  natural  form  (Fig¬ 
ure  5)  .  In  the  example  to  follow  we  take  A  as  a 
diagonal  matrix  whose  entries  correspond  to  uncer¬ 
tainties  in  the  branch  conductance  matrix  G  .  Hence, 
the  perturbed  form  of  the  linear  resistive  network  (3) 
becomes: 

0  =  Ac(G -h  A)Af  X -f  -f  Au,tiJ  .  (4) 

In  order  to  define  the  uncertainty  set,  let  Ci  denote  the 
ith  standard  basis  vector.  Its  dimension  is  determined 
from  the  context.  Now,  the  uncertainty  structure  and 
vertex  set  are  defined  as  follows: 

D  =  {a  e  I  A,  <  A  <  Au}  ,  (5) 

Vp  =  { A  6  X>  1  ef  Aej  €  {ef  A|Cj ,  ef  AuCy  }  , 

1  <  ‘  <  nv  .  1  <  i  <  n£  }  .(6) 


5  Example:  Robustness  Anal¬ 
ysis 


We  consider  a  mesh  consisting  of  40  linear  conduc¬ 
tance  branches  and  25  temperature  nodes.  As  in  (3)  , 
[Ac,Au,Auy]  €  Is  incidence  matrix 

and  consists  of  0*s,  Ts  and  -Vs  to  denote  the  di¬ 
rected  graph  associated  with  the  mesh,  u  6  IR^  and 
w  6  denote  the  control  input  and  exogenous  in¬ 
put  fluxes,  respectively,  y  E  IR^  denotes  the  mea¬ 
sured  node  temperatures.  The  node  temperatures 
satisfy  (3)  where  G  =  diag(y)  .  Nominal  operating 
conditions  (denoted  by  0  subscripts)  are  determined 
to  minimize  the  deviation  from  a  uniform  tempera¬ 
ture  profile  across  the  measured  nodes.  The  nominal 
operation  is  determined  by 

Ac  diag(yo)  aJ  xq  +  A^^txo  A,^u;o  =  0  , 


where  yo  >  0  and  tuo  >  0  .  Let  the  performance  vari¬ 
able  z  be  defined  as  2  =  (y^  ,  where  zq  >  0  . 

The  design  problem  is  posed  as  follows:  determine 
u  —  Ufc  +  Ky  such  that  the  performance  measure 
<p{^K.K)=  max  ||A7»(z  -  2o)||„ 

|lA„‘(tfl-u)o)|l„  <  1 

llAaMs  -  9o)||oo  <  1 

is  minimized  for  the  feedback  perturbed  plant  model 
in  Figure  5  ;  we  choose  A^  =  diag(iyo)  1  Ac  = 
O.ldiag(yo)  and  A,  =  diag(2o)  .  Recall  that  by  (7)  , 
the  performance  measure  is  achieved  at  the  vertices. 
Note  that  we  allow  a  10%  uncertainty  in  all  40  con¬ 
ductance  branches.  Hence,  for  a  given  design,  the 
exact  measure  could  have  been  obtained  by  2^®  max 
row  sum  evaluations.  To  make  the  calculations  rea¬ 
sonable,  we  will  rate  the  designs  according  to  10%  un¬ 
certainty  in  the  first  10  conductance  branches;  hence 
determine  a  lower  bound  on  (p  (but  this  is  exact  if 
only  10  parameters  are  varied)  . 


In  the  first  design  approach,  we  take  G  =  Go  and 
solve 


Let  denote  the  fractional  form  obtained  for 

Ky  (for  u  =  u/c  -h  iCy  ,  tu  gets  augmented  by  one 
entry)  .  From  Figure  5, 


^f*«,(A)  —  ffix -I- ITij A(/ —  H22A)  . 

Let  dct(/  -  H27A)  0  for  all  A  E  P  .  Under  these 

assumptions,  it  can  be  shown  that: 


(^/Ci,  R"!)  =  arg  min  max  || 

(«K:,A'^|A“l(ti,  -  ti;o)||oo  <  1 

G  =  Gq 


a:‘(2-zo) 


Thus,  the  control  law  is  optimal  under  perfect  plant 
modeling. 

In  the  second  design  approach,  the  first  order  approx¬ 
imation  to  the  plant 


maxl|i7,^(A)||i,oo  =  .  (7) 

This  result  means  that  the  worst-case  maximum  row 
sum  of  the  linear  fractional  form  over  V  is 

achieved  at  the  vertex  set  Vp  .  We  utilize  this  re¬ 
sult  in  the  following  section,  in  a  design  example.  It 
can  also  be  shown  that  min^^p  l|^fxix;(A)l|i^oo  is  not 
necessarily  achieved  at  the  vertex  set. 


Ac  diag(so)  AJ(x  -  xq)  +  Xu(u  -  uo)  +  -  iuq)  + 

diag(9  -  Jo)  A'^xo  =  0  (8) 

is  used  to  solve  for  the  optimal  sensitivity  controller 
(ux„iC,)  =  arg  jmn  max  . 

(«ic,K^|A^1(u,  _  U,o)||oo  <  1 

-  so)l|=o  <  1 


lb 


In  other  words,  the  input  to  the  A  block  in  Figure  5  is 
at  (q  =  AJ Zq  ,  and  the  exogenous  inputs  w  arc 
augmented  by  40  more  entries  to  account  for  (g  ~  ^q) 
in  (8).  The  results  are  summarized  in  Table  1. 


9  =9o 

0.9ffo  <9  <  l.lgo 

ip{uo,  0) 

6.99% 

>  13.37% 

3.47% 

>  223.45% 

3.67% 

>  6.63% 

Table  1:  Performance  ratings 


Note  that  the  first  row  in  Table  1  corresponds  to  the 
openloop  performance.  The  input  is  set  to  Uq  ;  hence 
the  performance  measure  reflects  the  relative  change 
in  y  about  y©  .  At  the  nominal  conductance  parame¬ 
ters,  worst-ease  deviation  is  6.99%  and  when  the  first 
10  conductances  have  10%  uncertainty,  worst-case  de¬ 
viation  is  13.37%  .  Note  that  the  second  column  of 
Table  1  is  a  lower  bound  on  the  performance  measure 
(p  since  the  parameter  perturbations  are  restricted  to 
the  first  ten  branches,  only.  Recall  that  the  first  de¬ 
sign  did  not  take  into  account  any  parametric  uncer¬ 
tainty.  Hence  the  nominal  performance  is  better  than 
the  open-loop;  however,  10%  parametric  uncertainty 
can  cause  a  deviation  more  than  twice  the  nominal 
2o  .  In  the  second  design,  by  augmenting  the  ex¬ 
ogenous  inputs  w  with  the  first-order  effect  of  the 
parametric  changes,  a  more  cautious  nominal  design 
is  achieved.  With  uncertainty  in  the  first  ten  param¬ 
eters,  the  worst-case  deviation  is  now  half  of  open- 
loop  deviation,  although  the  nominal  performance  is 
slightly  worse  than  the  first  nominal  design. 

6  Further  Robustness  Analysis 


A  little  more  notation  is  needed  for  this  section.  For 
A  e  ,  P2^(>1)  denotes  the  maximum  absolute 

value  of  the  real  eigenvalues  of  A  .  For  a  real  matrix 
A  ,  1  A  I  denotes  the  absolute  value  of  A  ,  i.e,, 
cf  I  A  I  C;  =1  efAcj  1  .  n(A)  =  ^(|  A  |)  =  p(|  A  |)  ; 
also  referred  to  as  the  Perron  eigenvalue  of  A  .  Let 
1  E  IR”  denote  a  vector  of  all  I’s  . 


Let  Hii  +  Hi2A{I  —  H22A)^^H2i  , 

where  H  = 

[  "21  ^22 

and  A  is  diagonal.  Under  these  assumptions, 
min  7 

ll^-(A)||,- «  <  7 
llAlkoo  <  ^ 
det(/  —  H^22-^)  7^  0 


max  p 
€  €  {ci,. - 

w  e  {-1.1}'''- 

S  £  diagf-l.l} 


1  0  ]\ 

\L  H211U  H22  j 

1  0  s  Jj 

< 

max 

n 

f[ 

H\\W 

Hii 

*  €  {ei. 

•,Cn 

} 

VI 

H21W 

Hii 

u)  e  {- 

1. 

ir- 

< 

max  n 

( 

1 1 

ef  Hi2 

1  <i<n* 

\ 

1 

Hii  1 

I 

H22 

J 

< 

■  Hn 

Hii  ' 

Hii 

Hii 

A  cheap  lower  bound  to  the  optimal  7  value  above  can 
be  obtained  by  evaluating  pj^  of  a  smaller  number  of 
matrices  rather  than  the  huge  number 
Note  that  typically,  n,  »  .  Using  "the  Perron 

eigenvalues,  coarser  upper  bounds  on  the  optimal  7 
can  be  obtained  by  n,2"-  and  n,  eigenvalue  evalua- 
tions,  respectively. 


Note  also  that  if  the  entries  of  H  are  all  positive,  then 
the  optimal  7  is  given  by 

efi 

Hji  1  Hi, 


nun 


7  =  max 

<;  mf  l<l<n 

\m.^  <  f 

det(/  —  H32A)  ^  0 


=  max  n  ( 
\L 


])■ 


7  Conclusion 


The  sensitivity  and  robustness  to  parameter  uncer¬ 
tainty  of  operating  points  of  thermal  processes  has 
been  investigated  using  static  fccdforward/feedback 
control.  The  problem  is  motivated  using  a  large-scale 
linear  resistive  network.  Efficient  computational  tools 
are  developed  to  handle  a  large  number  of  nodes  and 
branches.  Successive  design  studies  involving  differ¬ 
ent  operating  points,  actuator/sensor  selections  can 
be  easily  performed. 
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ABSTRACT 

Precise  trajectory  following  in  the  presence  of  actuator  sat¬ 
uration  constraints  is  important  in  performance  of  many 
control  systems.  An  approximate  finite-time  tracking  prob¬ 
lem  is  formulated  for  a  multiTariable  discrete-time  linear 
time-invariant  system.  The  actuator  saturation  constraints 
are  taken  into  account  explicitly.  The  problem  is  set  up  as  a 
set  of  linear  prograunming  problems  using  a  transfer  matrix 
approach.  The  approach  is  applied  to  temperature  profile 
tracking  in  rapid  thermal  processing  (RTP)  systems.  It  is 
known  that  fast  and  precise  temperature  tracking  is  crucial 
in  performance  of  RTP  systems.  The  actuator  saturation 
manifests  itself  in  terms  of  the  power  driving  the  lamps. 
Two  rapid  thermal  processing  examples  are  presented  to 
illustrate  the  validity  of  the  approach. 

1  Introduction 

Precise  trajectory  foUowing  is  important  in  performance  of 
many  control  systems  .  The  problem  of  finite-time  tracking 
in  the  presence  of  actuator  saturation  has  been  a  generic 
problem  in  control  theory.  In  servomechanisms,  a  velocity 
profile  is  to  be  followed  precisely.  In  many  thermal  sys¬ 
tems,  a  temperature  profile  is  to  be  followed  quickly  and 
accurately.  The  performance  in  many  of  these  systems  is 
tied  to  how  fast  the  desired  trajectory  can  be  foDowed.  In 
almost  all  systems,  actuators  saturate  because  of  limited 
dynamic  range.  For  example,  a  valve  saturates  when  it 
is  completely  open  or  closed.  The  control  surfaces  on  an 
aircraft  can  be  moved  by  a  certain  angle  &om  their  nomi¬ 
nal  positions.  In  thermal  systems,  heaters  or  lamps  can  be 
driven  between  minimum  and  maximum  power  settings. 
For  example  in  rapid  thermal  processing  (RTP),  precise 
temperature  trajectory  following  is  crucial  and  the  actua¬ 
tors  (lamps)  have  a  finite  maximum  power  driving  them. 
Furthermore,  in  these  problems  the  minimum  actuator  set¬ 
ting  is  zero  power  so  that  the  saturation  nonlinearity  is  not 
symmetric.  Therefore,  the  problem  of  fast  tracking  in  the 
presence  of  actuator  saturation  has  been  a  generic  problem 
in  control  theory. 


*T>u»  re»carch  is  supported  in  part  by  ARPA  under  AFOSR 
Contract  F49620-94-C-0003. 


Kalman  [l]  was  the  first  to  realize  that  the  problem  can 
be  addressed  effectively  if  it  is  formulated  in  a  discrete- 
time  setting.  Schmidt  [2]  solved  the  problem  for  low  order 
single-input-single-output  systems.  This  problem  has  been 
previously  addressed  by  the  present  authors  in  reference  [3] 
in  a  continuous- time  setting  with  applications  to  control  of 
flexible  structures.  Other  related  work  are  contained  in  [4], 
[5],  [6].  In  this  paper,  a  solution  to  the  finite-time  tradt- 
ing  with  actuator  saturation  is  proposed  for  multivariable 
linear  time-invariant  discrete  systems.  The  approach  relies 
on  the  transfer  matrix  to  formulate  constraints  on  the  set 
of  admissible  finite  duration  control  signals  that  achieve 
precise  point-to-point  trajectory  following.  The  problem 
is  first  formulated  as  an  open-loop  problem.  The  shape  of 
the  best  input  signal  to  achieve  the  finite-time  tracking  is 
derived.  We  then  consider  the  closed-loop  implementation 
of  the  problem  so  that  the  same  control  signal  is  produced 
at  the  input  of  the  plant.  The  actuator  saturation  is  taken 
into  account  explicitly.  The  problem  is  set  up  as  a  set 
of  linear  programming  problems.  This  represents  the  first 
complete  solution  to  the  problem.  The  motivating  exam¬ 
ple  for  the  present  research  is  fast  temperature  tracking  in 
rapid  thermal  processing  of  semiconductor  wafers.  There¬ 
fore,  a  short  description  of  RTP  appears  next  before  the 
problem  formulation. 

2  Rapid  Thermal  Processing 

A  variety  of  different  steps  arc  involved  in  semiconduc¬ 
tor  microelectronics  manufacturing.  These  steps  include 
oxidation,  lithography,  epitaxial  film  growth  (epi),  anneal¬ 
ing,  eVD,  etc.  Each  of  these  steps  is  a  distinct  part  of 
the  process  and  uses  associated  processing  equipment.  An 
important  state-of-thc  art  technique  to  perform  some  of 
these  steps  is  RTP.  This  technique  has  major  advantages 
over  conventional  furnace-based  batch  thermal  processing 
of  semiconductor  wafers.  In  the  conventional  furnace-based 
techniques,  the  processing  step  involves  several  hours,  and 
the  speed  is  limited  by  the  large  thermal  masses  of  the 
walls.  In  contrast,  in  RTP  only  the  wafer  mass  is  heated 
or  cooled  and  the  RTP  walls  axe  water  cooled  and  kept 
at  room  temperature.  This  cuts  down  the  processing  time 
to  seconds.  From  a  manufacturing  point  of  view,  RTP 
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fits  naturally  into  the  current  cluster-tool  concepts  which 
promise  IC  fabrication  lines  which  are  more  flexible,  and 
much  less  capital  intensive  as  compared  to  present  bil¬ 
lion  dollar  state-of-the-art  fabs.  RTP  is  an  essential  tech¬ 
nology  for  single-wafer  processing.  RTP*s  viability  has 
been  demonstrated  for  process  steps  such  ats  siiicidation, 
RTCVD,  and  annealing  [8].  It  has  been  also  proposed  as 
an  efficient  way  to  clean  wafers.  The  key  enabling  tech¬ 
nology  for  application  of  RTP  in  a  manufacturing  setting 
\z  the  precise  temporal  and  spatial  control  of  temperature. 
Sensors  are  now  available  [9]  but  control  design  needs  fur¬ 
ther  development. 

Typical  RTP  systems  are  described  in  [7].  A  diagram  of  a 
typical  system  is  shown  in  Figures  1-2  [7].  The  wafer  is 


NOT  SHOWN;  QUARTZ  PINS  TO  SUPPORT  WAFER 
CAS  INJECT  AND  EXHAUST  PORTS 

Figure  1:  Cross-section  of  generic  RTP  sysieiu. 

heated  by  radiation  via  a  lamp  array.  In  one  design  (Fig¬ 
ure  2),  rings  of  tungsten-halogen  lamps  arrays  are  used  as 
heaters  (actuators)  and  axe  separated  from  the  chamber  by 
a  quarts  window.  The  lamp  array  has  a  hexagonal  pack¬ 
ing.  The  wafer  is  heated  only  from  the  top-side.  There  are 
several  other  competing  alternatives  for  the  lamp  design 
(e.g.,  a  two  array  scheme  [7])  .  However,  precise  tempera¬ 
ture  control  is  reqmred  regardless  of  the  lamp  design.  The 
lamp  voltage  requirements  are  from  20  to  200  volts  but 
are  dependent  on  the  chamber  geometry,  etc.  The  cham¬ 
ber  has  a  large  number  of  inputs  and  outputs.  It  uses 
advanced  pyrometers  as  temperature  sensors  (see  Figure 
3).  The  system  is  to  follow  a  pre-defined  temperature  pro¬ 
file  (ramp  up,  hold,  cool  down)  such  as  shown  in  Figure 
4,  and  accurate  tracking  of  the  temperature  profile  is  re¬ 
quired  along  with  minimal  overshoots  at  transitions  and 
minimal  spatial  temperature  variations  during  all  phases 


Bottom  Vl*w  of  Lamp  Haad: 


Figure  2:  One-sided  heating  with  hexagonal  array  of 
tungsten-halogen  lamps. 


of  the  profile.  This  must  be  done  to  insure  that  all  wafers 
arc  processed  the  same  way  so  as  to  achieve  repeatability. 
Some  designs  keep  the  wafer  fixed,  whereas  in  others  the 
wafer  is  rotated  during  the  processing  cycle.  The  wafer 
rotation  can  induce  angular  disturbances  which  must  be 
rejected.  Other  disturbances  in  the  system  arc  low  fre¬ 
quency  ones,  e.g.,  the  heat  transferred  to  the  wafer  from 
the  quartz  window  when  the  lamps  are  first  turned  on.  To 
meet  requirements  for  0.25/im  devices,  SEMATECH  has 
established  1995  goals  for  temperature  uniformity  across 
the  wafer  of  ±3*C  for  oxide  and  i:5®C  for  other  processes. 
This  must  be  so  to  ensure  that  processing  is  uniform  across 
the  wafer  and  thermal  stress  docs  not  result  in  wafer  de¬ 
fects  (warping,  slip).  Accurate  and  repeatable  temperature 
control  is  required  starting  at  low  temperatures  and  high 
temperatures.  Temperature  uniformity  is  required  on  all 
the  wafer  in  spite  of  the  fact  that  temperature  is  being 
measured  only  at  a  finite  number  of  points.  The  control 
system  must  deal  with  actuator  saturations  (corresponding 
to  maximum  lamp  power  setting  or  intensity  of  200  volts). 
This  is  a  particularly  important  problem  in  RTP  because 
of  the  wide  ranges  of  process  temperature  setpoints.  Other 
factors  affecting  system  performance  include  wafer  diame¬ 
ter,  chamber  geometry,  gas  flow  uniformity  and  cooling  of 
the  chamber. 

Precise  temperature  control  is  critical  to  this  promising 


technology.  In  these  systems,  many  heaters  affect  the  tem¬ 
perature  at  each  location  where  it  is  measured.  Control 
that  explicitly  accounts  for  the  influence  of  each  heat  source 
on  each  temperature  sensor  is  needed  for  acceptable  perfor- 
rnance.  With  such  strong  physical  coupling,  it  is  extraor¬ 
dinarily  difficult  to  obtain  acceptable  control  of  the  tem¬ 
perature  profile  using  single  loop  conventional  controllers 
commonly  used  in  industrial  applications.  Moreover,  since 
previous  approaches  relied  heavily  on  precise  calibration, 
slight  changes  in  chamber  design  or  wafer  geometry  can 
require  substantial  and  time-consuming  efforts  in  control 
redesign.  The  necessity  for  meeting  extremely  high  per¬ 
formance  specifications  requires  that  the  control  system 
be  optimal  with  respect  to  the  specific  process  being  con¬ 
trolled.  As  a  result,  model-based  multivariable  control  sys¬ 
tem  design  is  a  must. 


Figure  3:  Sensor  Locations 


The  approach  presented  in  this  paper  will  also  allow  ns  to 
explicitly  account  for  actuator  saturations  that  hsTe  been 
handled  in  an  ad  hoc  fashion  in  current  designs.  In  addi¬ 
tion,  the  methodology  is  general  so  that  it  can  be  applied 
to  other  processes. 

3  Problem  Formulation  and 
Main  Result 


Let  the  plant,  P,  have  a  minimal  state-space  description 
with  Tic  inputs,  Tio  outputs  ajid  n#  state  variables. 

Hence, 

x(A-K1)  =  Ax(jk)-hBu(ik)  (1) 

y{k)  =  Cx(k).  (2) 

Assume  that  the  plant,  P,  is  at  rest,  i.e.,  xo  =  0.  We  are 
interested  in  the  rest-to-rest  maneuvering  of  the  system 
subject  to  actuator  saturation.  Specifically,  let  each  of  the 
inputs  have  a  specified  actuator  constraint, 


u,  min  <  Ui  <  m  max 


Figure  4:  Topical  RTP  Cycle  (for  oxide  growth). 


For  s  given  integer  N,  let  Un  denote  the  set  of  all  control 
inputs  bounded  by  Eq.  (3)  bnt  of  duration  N  time  steps. 
We  say  that  the  system  tracks  a  reference  input  r  iff 

y(h)  =  r(k)  for  all  Jb  >  N.  (4) 

The  goal  is  to  find  i»  6  Ww  such  that  Eq.  (4)  holds. 

Assume  that  the  plant  is  internally  stable  and  its  transfer 
matrix  has  been  written  in  the  pole-residue  form  (assuming 
distinct  poles), 

pw=c(./-xr‘s  =  f;j^  (5) 

where 

Hi^CviqjB  »  =  (6) 

Vi,  qj  arc  the  right  and  left  eigenvectors  of  A  respec¬ 
tively.  Suppose  it  is  desired  to  follow  a  constant  reference 
input.  We  may  decompose  the  input  U{z)  into  two  parts: 
an  N-tap  FER  and  a  steady-state  part  as  follows, 

U{z)  =  Uriji{z)  +  (7) 

Specifically,  we  may  represent  the  input  as, 

iscO  '  ' 

where  p{  and  p  are  vectors  of  size  ric.  The  first  part  of  the 
right  hand  side  of  Eq.  (8)  is  a  set  of  N-tap  FIR  filters  whose 
effect  vanishes  exactly  after  N  time  steps.  The  second  term 
is  a  vector  of  delayed  steps.  The  control  sequence  is  then, 

N 

«(*)  =  Y^Pk5k+pl{k  -  AT  -  1) 

*=0 


I  —  1,2,...,  itg. 


(3) 


(9) 


where  6h  is  the  unit  pulse 


We  may  also  decompose  the  output  of  the  system  into  an 
N>tap  FIR  part  and  a  steady-state  component, 

Y{z)  =  C{zl  -  A)-^BUiz)  =  Yy,n{z)  +  Y..{z).  (10) 

The  desired  steady-state  output  is, 

W  =  (n) 

where  to  is  the  constant  output  levels.  We  may  substitute 
£)qs.  (8)  and  (11)  in  Eq.  (10)  and  equate  the  residues  for 
vhe  plant  poles,  2  =  A^,  as  well  as  the  pole  of  the  input 
signal  at  2  =  1.  For  z  =  1  we  obtain, 


and  for  each  A^-  we  have, 

^  -  Ar^ 

Y,  V  +  Si  =  0  i  =  1 . n..  (13) 

Given  N ,  the  problem  may  then  be  formulated  as  a  linear 
programming  (LP)  problem, 


max  A 


A  series  of  convex  minimisation  problems  can  be  solved  by 
varying  N  to  obtain  a  solution  to  the  finite-time  tracking 
problem  which  satisfies  the  hard  limits  on  the  actuators. 
The  design  procedure  is  as  foUows: 

Step  1.  Select  a  large  enough  N  (depending  on  plant  dy- 
namics). 

Step  2.  Solve  the  linear  program.  If  feasible,  decrease 
N  and  repeat  until  infeasible.  If  infeasible,  increase  N. 
Continue  untfl  a  satisfactory  solution  is  obtained. 

Step  3.  Form  the  input  sequence,  «(*),  and  evaluate  the 
trackmg  performance. 

Note  that  at  any  N  for  which  A  >  1,  scaHng  of  the  con¬ 
trol  aignal  by  A  achieves  the  desired  tracking  and  satisfies 
the  saturation  constraint.  However,  the  extremal  solution 
(i.c.  the  smallest  N)  will  correspond  to  the  case  where 
optimd  A  =  1,  The  solution  to  this  problem  has  an  inter¬ 
pretation  &om  the  classical  deadbeat  control  concepts.  By 
construction,  U{z)  will  always  contain  seros  to  cancel  all 
the  (stable)  plant  poles.  The  combination  o{  P{z)U{z)  will 
have  N  poles  at  the  origin.  Hence  constant  inputs  will  be 
tracked  in  N  time  steps.  The  locations  of  seros  of  U{z) 
have  a  nice  geometrical  pattern  as  seen  in  the  following  cx- 
ampl^.  Note  that  steps  1-3  axe  performed  off-line  and  the 
resulting  input  sequences  arc  stored  in  a  table  for  on-line 
use. 

4  Example  1:  SISO  RTF  Plant 


Thia  ia  a  model  of  an  RTF  aystem  repreaented  by  the  first 
order  plant 


2.3164 
(2 -.9964)' 


The  tracking  is  achieved  in  N  =  3  time  periods  as  shown 
in  Fig.  5.  The  seros  of  ll(z)  arc  located  at  0.4982±j0.8629 
and  at  the  plant  pole  at  0.9964.  The  seros  of  U{z)  appear 
inside  the  unit  circle  in  a  Butterworth  pattern. 


5  Example  2:  MIMO  RTF 
Plant 


This  is  a  4-input-4  output  model  of  an  RTF  reactor  as 
described  in  [11].  The  poles  of  the  system  are  at  0.9052 
0.9827,  0.9856  and  0.9972  and  it  has  no  finite  transmission 
seros.  The  plant  model  is  specified  so  that  the  input  is  in 
percent  power  and  the  output  is  temperature  in  ‘C.  The 
linear  model  is  valid  between  750“C  and  1050*C.  It  is 


assumed  that  the  system  is  at  7S0*C  and  is  to  be  taken 
to  1050*C  in  the  fastest  possible  time  and  held  there.  The 
constraints  on  the  actuators  are  power  settings  between  9% 
and  100%. 

Several  values  of  N  were  tried  (N=60,  65,  70).  The  track¬ 
ing  requirement  is  met  for  AT  =  70  as  shown  in  Figure 
6.  Note  that  ail  the  control  signals  shown  in  Figure  7  are 
of  the  “bang-bang-hold”  variety.  It  is  interesting  to  see 
that  uj  turns  off  quickly  due  to  the  coupling  in  the  sys¬ 
tem.  However,  as  soon  as  uj  and  u,  turn  off,  uj  is  quickly 
turned  back  on.  Since  the  sampling  period  is  0,05  second, 
the  system  tracks  the  300*C  step  in  less  than  four  seconds. 
This  is  well  below  the  10-15  seconds  settling-time  reported 
in  (11). 

The  zeros  of  U{z)  have  a  Butterworth  pattern  and  include 
all  the  four  plant  poles. 

While  the  speed  of  tracking  is  the  best  possible,  in  the  ac¬ 
tual  system,  it  is  desired  to  have  all  the  responses  grow 
together  to  avoid  wafer  sUp.  Ways  to  deal  with  this  are 
currently  under  investigation.  The  response  shown  in  Fig¬ 
ure  6  is,  however,  quite  acceptable  at  low  temperatures. 


6  Closed-Loop  Implementation 

The  finite-time  tracking  may  be  implemented  in  a  closed- 
loop  control  structure  to  provide  disturbance  rejection  and 
robustness  with  respect  to  modeling  errors.  Consider  the 
dosed-loop  system  shown  in  Figure  8  where  u  is  the  track¬ 
ing  command  and  y  =  If  the  plant  model  is  accurate, 
in  the  absence  of  disturbances,  the  same  tracking  perfor¬ 
mance  is  achieved  as  in  the  open-loop  configuration.  The 
feedback  controller  K(z)  provides  disturbance  rejection  ca¬ 
pability  and  is  designed  independently  of  u.  Since  part  of 
the  actuation  is  now  used  for  disturbance  rejection,  the  de¬ 
sign  in  the  previous  sections  must  be  conservative.  The  de¬ 
signer  then  “backs  off*  on  timiB  and  UmAx  to  leave  enough 
head  room  for  disturbance  rejection.  One  can  choose  a 
large  enough  N  so  that  a  value  of  A  >  1  is  achieved.  The 
open-loop  control  is  decreased  by  a  factor  of  to  leave 
enough  actuator  authority  for  disturbance  rejection. 

7  Conclusions 

This  paper  presents  a  complete  solution  to  the  rest-to-rcst 
finite-time  tracking  problem  in  the  presence  of  actuator 
constraints.  While  parts  of  the  solution  had  existed  in 
the  literature,  the  problem  is  solved  for  the  general  mul¬ 
tivariable  case  using  the  solution  to  a  single  linear  pro¬ 
gram.  Only  constant  signal  tracking  was  discussed.  How¬ 
ever,  the  methodology  is  general  and  applies  to  tracking  of 
signals  that  can  be  generated  by  a  linear  system.  It  was 
assumed  that  the  open-loop  system  is  stable.  This  is  a 
generic  property  of  process  control  systems  for  which  the 


technique  is  intended.  The  technique  may  be  used  to  es¬ 
tablish  the  upper  limit  on  tracking  performance,  i.c.,  the 
minimum  tracking  time  achievable.  The  connection  to  the 
classical  deadbeat  control  theory  is  interesting.  Deadbeat 
control  establishes  the  minimum  tracking  time  achievable 
without  actuator  constraints.  Even  though  the  approach 
to  the  problem  is  open-loop,  the  final  implementation  may 
be  done  in  a  closed-loop  fashion,  i.e,,  using  a  combination 
of  feedforward/fccdback  as  in  Figure  7.  The  technique  is 
currently  being  applied  to  thermal  systems  (RTF,  APCVD 
furnace). 
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Abstract 

A  first- principles  based  nonlinear  algebraic  model  of  a 
rapid  thermal  processing  chamber  is  used  to  consider  an 
actuator-sensor  selection  problem  based  on  static  (steady- 
state)  performance.  A  two  step  systematic  procedure  is 
proposed  and  illustrated  on  an  example.  Actuator  group¬ 
ings  are  ranked  according  to  the  achievable  best  nominal 
ui^ormity  levels.  Once  the  actuator  groupings  are  de¬ 
termined,  the  sensor  locations  are  rated  according  to  the 
best  worst-case  achievable  closed-loop  performance. 

1  Introduction 

Rapid  thermal  processing  (RTP)  is  an  efficient  multi¬ 
chamber  single- wafer  processing  approach  in  integrated 
circuit  manufacturine.  The  chemical  reaction  recipes  and 
high  throughput  goal,  demand  fast  tracking  control  laws 
that  achieve  near  uniform  spatial  temperature  distribu¬ 
tions  across  a  semiconductor  wafer,  during  both  transient 
and  steady-state  phases  of  the  process.  Due  to  the  ra¬ 
diative  effects,  small  chamber  volume  and  remote  sensing 
restrictions,  the  desired  performance  specifications  pose 
a  challenging  actuator-sensor  selection  problem.  Multiple 
tungsten- halogen  lamps  act  as  the  only  heat  source.  There 
is  no  active  cooling  of  the  wafer.  Surface  properties  of 
the  wafers  differ  initially  and  also  vary  during  processing. 
Temperature  readings  across  the  wafer  by  remote  sensing 
techniques  are  a  function  of  uncertain  surface  properties. 
In  the  course  of  chamber  design,  different  actuator- sensor 
configuration  choices  due  to  lamp  groupings  and  pyrom¬ 
eter  target  locations,  need  to  be  ranked.  The  ranimg  of 
these  choices  will  be  based  on  “worst-case”  performance 
degradations  under  “best”  possible  control  design. 

2  Problem  Description 

Consider  the  algebraic  steady-state  (subject  to  constant 
inputs)  model  of  an  RTP  chamber,  denoted  by  V  : 

{0  =  f(x,w,u) 

I  =  ,  (1) 

Wmin  ^  ^  ^m».x 

where  w  6  IR'*"'  ,  u  €  IR'**  ,  x  €  IR”*"  denote  the  uncer¬ 
tain  constant  parameters,  input  values  and  steady-state 
cell  temperatures,  respectively.  The  minimum  power  lev¬ 
els  are  nonnegative  (no  active  cooling).  Vector  inequal¬ 
ities  are  to  be  interpreted  entry  by  entry.  The  model 
in  (1)  is  obtained  from  a  first-principles  based  dynamic 
model  of  an  RTP  chamber,  where  the  nonlinearities  /  , 
and  g  are  smooth,  and  determined  by  heat  conservation 
equations  (see  e.g.,  [l])  .  For  a  given  u;  and  ii  ,  there  is 
a  unique  steady-state  determined  by  (1)  .  The  outputs 
y  €  IR”’'  denote  the  pyrometer  readings  and  z  £  IR”*  de¬ 
notes  unmeasured  regulated  cell  temperatures  across  the 
wafer.  For  a  given  reference  temperature  r  €  IR4  ,  the 
performance  goal  is  to  minimize  the  worst-case  spatial 
temperature  error  {z  —  rl)  subject  to  w  ,  where  1  de¬ 
notes  a  vector  of  ones.  The  uncertain  constant  parameter 
vector  w  (denoting  uncertainties  in  emissivities,  thermal 
conductance,  thermal  mass  etc.)  is  in  a  known  polytope 
Wmin  <  <  Wmxx  •  The  effect  of  imposing  integral 

action  based  on  the  pyrometer  readings  (uy  <  Uu)  will 
also  be  considered  in  answering  the  following  questions: 

•  What  is  the  best  uniformity  predicted  by  the 
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model  ? 

•  What  are  the  best  lamp  grouping  and  sensor  lo¬ 
cations  for  a  range  of  r’s  ? 

Since  some  of  the  w  entries  represent  characteristics  of 
the  chamber,  an  answer  to  the  first  question  above  serves 
a  dual  purpose:  the  limits  of  performance  are  determined 
and  suggestions  can  be  made  for  a  possible  chamber  re¬ 
design  to  further  improve  the  limits  of  performance.  The 
second  question  above,  is  conceptually  a  special  case  of 
the  previous,  by  merely  grouping  actuators,  selecting  sen¬ 
sors  and  redehimg  the  model  under  study.  It  has  more 
of  a  practic^  significance  in  that  one  can  determine  an 
implementation  cost  versus  achievable  uniformity  trade¬ 
off  in  order  to  justify  additional  sensors  and/or  actuators 
in  the  control  design.  In  the  rest  of  the  paper,  we  address 
these  questions  and  propose  solution  methods. 

3  Operating  Point:  Best  Nominal  Uniformity 
Let  the  spatial  uniformity  error  be  quantified  with  its 
root-sum-square-error  (Frobcnius  norm)  and  let  wa  de¬ 
note  a  nominal  parameter.  For  a  given  reference  r  ,  the 
best  steady- state  uniformity  is  achieved  at  an  input  level 
u  =  u(r,  u?o)t  where 

tt(r,  u?o)  =  arg  min  llz-rlll  .  (2) 

Umia  <  tt  <  Un.*x  H  V  ; 


Let  x(r,  tuo)  denote  the  unique  state  associated  with  (2)  . 
Using  Newton- Raphson  iterations  based  on  quadratic  pro¬ 
gram  solutions,  the  operating  point  map  («,  x)(  • ,  • )  is 
evaluated  for  different  references  r  and/or  parameters 
w  .  The  associated  minimum  determines  the  performance 
lower  bound  dictated  by  the  plant  model.  No  combination 
of  feedforward/feedback  can  achieve  a  nominal  uniformity 
error  smaller  than  this  lower  bound. 

3.1  Case  Study:  Nominal  Uniformity 
For  the  model  under  study,  the  dimensions  in  (1)  are 
Tij:  =  115,  Hu;  =  4,  Hu  =  21  ,  u,  =  21  and  riy  =  27  . 
Power  levels  are  restricted  to  the  interval  [0.02,1]  .  For 
a  nominal  wq  ,  the  full  21  actuator  case  and  a  particular 
6  actuator  grouping  were  compared  for  reference  r  rang¬ 
ing  from  600® C  to  1200® C  with  100®C  increments.  For 
each  actuator  grouping  and  reference,  a  problem  as  in  (2) 
was  solved.  The  results  are  shown  in  Figure  1  .  By  utiliz¬ 
ing  all  of  the  available  21  ^tuators,  the  uniformity  can  be 
improved  by  a  factor  ranging  from  5  to  10  ,  over  the  refer¬ 
ence  temperature  range.  As  illustrated  by  the  lower-right 
plot  in  Figure  1  ,  the  first  three  pairwise  groupings  (the 
first  six  X’s)  are  close  to  the  first  six  of  21  independent  so¬ 
lutions  (filled  circles).  However,  the  last  three  groupings 
(denoted  by  3,4  and  7  grouped  X’s)  are  considerably  off. 
and  the  uniformity  degradation  is  as  high  as  a  factor  of 
10  .  The  lower  curve  in  the  lower  left  plot  in  Figure  1  is 
a  performance  lower  bound.  By  generating  more  curves 
associated  with  candidate  actuator  groupings  (as  done  for 
the  sample  6  grouping),  one  can  decide  on  an  acceptable 
grouping  based  on  the  implementation  limitation  and  the 
degradation  in  uniformity  from  the  performance  limita¬ 
tion.  This  uniformity  comparison  can  also  be  performed 
for  different  minimum  and  maximum  power  levels. 

4  Operating  Point:  Best  Worst-Case  Uniformity 
Sensitivity  minimization  at  a  mven  operating  point  ba^ 
on  the  afRne  approximation  oi  the  model  and  control  was 
performed: 


1.  For  i  given  reference  r  ind  a  nominal  wq  ,  solve  the 
problem  in  (2),  Let  the  input  state  pair  (uo,xo)  be  the 
associated  minimizer.  Let  yo  and  20  be  the  outputs  and 
regulated  variables  at  the  operating  point  (uo,ro)  - 
2*  Obtain  the  steady-state  affine  approximation  to  the 
plant  model  in  (1)  about  the  operating  point  (uo,xo). 
Augment  with  actuator  and  sensor  noises  to  obtain  the 
real  matrix  relating  (u;auff,«)  to  (zau9,y)  • 

3.  Apply  the  affine  control  law  u  =  uo  +  uk  +  K{y  ~  yo) 
and  determine  the  feedfor ward /feed back  terms  by  solving 

(uk,K)  11^2  “  ^2)||oo 

|(Aj  (u^au^  “  ^l)l|oo  ^  1 

with  and  without  integral  action  constradnt  on  a  subset 
of  y’s.  Note  that  the  maximum  absolute  veJue  norm 
(II  *  II  o®)  ^  used  to  conform  to  the  interval  uncertainty 
aescription  of  w  .  The  weights  Ai,  A2  and  the  cen¬ 
ters  6t ,  ^2  are  design  parameters.  The  feedforward  term 
UK  is  redundant  if  the  centers  and  62  are  associ¬ 
ated  with  the  steady-state  operating  point.  The  prob¬ 
lem  can  be  transformed  into  a  linear- program  of  the  type 

X  \\Ti 


(  SPynX^J) 

where  the  matrix  variable  X  is  constrained  by  the  integral 
action  constraint  (if  there  is  any)  imposed  on  the  selected 
outputs  Sy  ,  where  S  denotes  a  selection  matrix  obtained 
by  choosing  the  appropriate  rows  of  an  identity  matrix. 
The  induct  matrix  norm  ||  •  ||,,oo  is  the  maximum  row 
sum.  A  detailed  description  of  the  transformation  and  ef¬ 
ficient  solution  methods  can  be  found  in  [2]. 


Figure  1:  Nominal  uniformity  study  for  6  and  21  actuator 
configurations. 

upper  left:  the  best  nominal  spatial  temperature  error 
distribution  for  the  particular  6  actuator  grouping, 
upper  right:  the  best  nominal  spatial  temperature  error 
distribution,  all  21  actuators  utilized, 
lower  left:  comparison  of  6  and  21  actuator  groupings, 
lower  right:  the  optimal  power  levels  at  800®  C  for  the  6 
(X’s)  and  21  (•’$)  actuator  groupings. 

4.1  O&se  Study:  Best  Worst-Case  Uniformity 
For  the  model  considered  in  Section  3.1  ,  a  sensitivity 
study  was  performed  at  the  reference  r  =  1100®C  .  The 
operating  points  at  the  desired  reference  was  computed 
utilizing  all  21  actuators  and  solving  the  problem  in  (2) 
with  power  levels  in  the  interval  [0.175,0.85]  .  The  plant 
was  linearized  about  this  operating  point  and  augmenta¬ 
tion  was  done  as  described  in  Section  4  .  The  centers  were 


selected  as  the  nominal  values  and  disturbance  weights 
were  selected  to  reflect  the  following:  i5%  variation  in 
plant  parameters,  ±0.5®C  sensor  noise,  and  ±0.01  actua¬ 
tor  noise.  Six  sensor  locations  were  chosen  and  two  sets 
of  feedback  values  were  computed  with  and  without  in¬ 
tegral  action  constraint  on  the  particular  six  measured 
outputs.  Let  H%  and  denote  the  closed-loop  gain  ma¬ 
trices  from  u;*ug  to  the  regulated  variables  2  and  actuator 
ti  ,  respectively  (including  the  associated  weightings  in 

u^Aug)  •  Similarly,  let  Hx  and  Hm,  denote  the  associated 
closed-loop  matrices  for  the  design  with  integral  action 
constraint.  The  subscripted  H  matrices  are  aB  21  by  31 

(n,  =  21  ,  Uu  =  21  ,  =  6  ,  =  4  +  6  21  =  31)  . 

Let  3gn{  • )  denote  the  signum  function;  i.e.,  1  for  nonneg- 
ative  and  -1  for  neptive,  evaluated  entry  by  entry.  Let 
a6s(  • )  denote  the  absolute  value  function  evduated  entry 
by  entry.  Since  the  design  problem  is  posed  in  terms  of  the 
peak  norm  and  the  ur^ug  weightings  are  already  included 
in  the  subscripted  H  matrices,  the  columns  of  the  matrix 
{Hsgn{H)'^)  correspond  to  the  21  spatial  worst-case  can¬ 
didates  and  the  vector  (a6a(if)l)  denote  the  spatial  peak 
deviation  at  the  worst-case  (see  Figure  2  )  . 


Fi^re  2:  Best  worst-case  fluctuations  about  the  nom¬ 
inal,  with  and  without  integral  action  constraint  on  the 
measured  outputs  used  in  feedback.  {(O^y^C')^)  (dashed 
lines)  ^d  {abs(-)l)  (solid  line)  evaluated  at  Hz  (upper 

left),  Hz  (upper  right)  H^^  (lower  left)  Hn  (lower  right). 
Concluding  Remarks 

Actuator-sensor  selection  was  considered  as  a  two  step 
procedure:  U  actuator  grouping  and  nominal  structural 
parameter  efl^ts  were  investigated  in  terms  of  best  nom¬ 
inal  uniformity  (Section  3),  and  2)  sensor  selection  and 
the  effect  of  integral  action  constraints  were  investigated 
in  terms  of  best  worst-case  uniformity  (Section  4)  under 
parametric  uncertainty.  The  proposed  approach  is  a  sys¬ 
tematic  way  of  determining  the  effect  of  actuator/sensor 
selection  in  operating  point  sensitivity  reduction.  Case 
study  results  were  used  to  illustrate  the  approach. 
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Abstract:  First-principles  based  nonlinear  dynamic  model  of  a  rapid  ther¬ 
mal  processing  chamber  is  used  to  consider  a  control/structure  interac¬ 
tion  problem.  The  proposed  approach  ranks  a  particular  chamber  de¬ 
sign  according  to  steady-state  wafer  uniformity  analysis  for  constant  in¬ 
puts  as  well  as  achievable  transient  uniformity  during  ramping  using  a 
baseline  integral  action  controller.  A  computational  toolset  is  developed 
to  determine  operating  points,  reduced-order  small  signal  equivalent  mod¬ 
els,  step-  or  ramp- tracking  feedback  controllers  and  a  baseline  dynamic  re¬ 
sponse  for  a  given  chamber  design.  The  approach  is  illustrated  on  a  model. 

Keywords:  Steady  state,  Process  control,  Thermal  equilibrium.  Temperature 
profiles,  Integral  action. 


1.  INTRODUCTION 

Rapid  thermal  processing  (RTP)  is  a  new  approach 
in  integrated  circuit  manufacturing;  it  is  a  fast 
and  efficient  multi-chamber  single-wafer  process¬ 
ing  approach  in  contrast  to  the  conventional  slow 
and  costly  single-chamber  multi-wafer  processing. 
A  typical  RTP  chamber  volume  is  much  smaller 
than  that  of  a  batch  processing  chamber;  moreover, 
RTP  chamber  walls  are  cooled.  Hence,  successive 
single- wafer  processing  can  be  done  rapidly,  and 
chamber  clean-up  is  not  required  between  consec¬ 
utive  processes.  The  chemical  reaction  recipes  and 
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high  throughput  goal  demand  fast  tracking  control 
laws  that  achieve  near  uniform  spatial  temperature 
distributions  across  a  semiconductor  wafer,  during 
both  transient  and  steady-state  phases  of  the  pro¬ 
cess.  Due  to  the  radiative  effects,  small  chamber 
volume  and  remote  sensing  restrictions,  the  desired 
performance  specifications  pose  a  challenging  con¬ 
trol/structure  interaction  problem. 

In  a  typical  RTP  chamber,  multiple  tungsten- 
halogen  lamps  act  as  the  only  heat  source.  There 
is  no  active  cooling  of  the  wafer.  Surface  prop¬ 
erties  of  the  wafers  may  differ  initially  and  wiU 
certainly  vary  during  processing,  as  well.  Tem¬ 
perature  measurements  across  the  wafer  have  to 


be  done  remotely,  and  most  remote  sensing  tech- 
niques  rely  on  uncertain  surface  properties.  A  chal* 
lenge  in  this  control/s  tract  ure  interaction  problem 
is  to  be  able  to  rank  different  actuator/sensor  con¬ 
figuration  choices  due  to  particular  lamp  group¬ 
ings,  pyrometer  target  locations  and  chamber  spe¬ 
cific  parameters  according  to  achievable  transient 
and  steady-state  wafer  spatial  temperature  uni¬ 
formities.  The  proposed  approach  in  this  pa¬ 
per  allows  the  user  to  determine  operating  points, 
reduced-order  small  signal  equivalent  models,  step- 
or  ramp-tracking  feedback  controllers  and  a  base¬ 
line  dynamic  response  for  a  given  chamber  de¬ 
sign.  The  approach  is  illustrated  on  a  model. 


plate  below  the  quartz  window  serves  as  a  shower- 
head  to  allow  gas  flow  into  the  system.  The  silicon 
wafer  is  located  below  the  showerhead.  A  guard 
ring  near  the  edge  of  the  wafer  improves  the  tem¬ 
perature  uniformity.  The  model  is  derived  for  low- 
pressure  operation  where  gas  flow  and  gas  convec¬ 
tion  heat  transfer  are  not  important.  The  elements 
of  the  system  are  divided  into  nodes  (n^  =  116)  and 
the  associated  conservation  equations  are  derived. 
A  two-band  radiation  model  is  implemented  to  ac¬ 
commodate  the  semitransparent  quartz  elements. 
The  model  is  parametrized  in  terms  of  uncertain 
emissivities,  thermal  conductance,  thermal  mass, 
etc.  The  silicon  wafer  is  divided  into  =  21  nodes. 


2.  MODEL  DESCRIPTION 


3.  STEADY-STATE  ANALYSIS 


A  first-principles  based  dynamic  model  of  the 
generic  RTP  system  is  denoted  by  V  : 

X  =  f{x,w,u) 

y  =  9{x,V}  ,  ) 

z  =  Cx  ’ 

u<  u  <u 

where  w  6  denotes  the  uncertain  parameters, 
u  6  IR^'*  denotes  the  input  values  and  x  £  IR^* 
denotes  the  node  temperatures.  A  subset  of  the 
states  X  corresponds  to  the  wafer  states,  denoted 
by  2  €  IR”'  .  Pyrometer  meaLSurements  are  denoted 
by  y  €  IR^^  .  The  minimum  power  levels  u  are 
nonnegative,  since  there  is  no  active  cooling.  The 
predetermined  maximum  power  levels  are  denoted 
by  u  .  Vector  inequalities  are  to  be  interpreted 
entry  by  entry. 

Typically,  a  dynamic  nonlinear  model  as  in  (1) 
is  obtained  from  first-principles  using  heat  con¬ 
servation  equations  associated  with  a  large-scale 
nonlinear  resistor-capacitor  network  consisting  of 
branches  that  model  conduction,  convection  and 
radiation;  see  e.g.  (Incropera  and  DeWitt,  1985)  . 

In  this  paper,  a  generic  RTP  system  model  devel¬ 
oped  by  Ebert,  ei  al.  (1995)  will  be  used  for  illus¬ 
tration  purposes.  In  this  particular  RTP  system, 
the  chamber  is  a  water-cooled  cylindrical  cavity 
with  five  independently  powered  lamps  (n^  =  5).  A 
thick  quartz  window  below  the  lamps  separate  the 
lamp  cavity  from  the  wafer  cavity.  A  thinner  quartz 


V  :  (m,u)  1-^  (2,y)  { 


In  a  typical  reaction  recipe,  the  wafer  has  to  main¬ 
tain  a  uniform  spatial  profile  while  tracking  a  de¬ 
sired  piecewise-linear  reference  trajectory  that  rep¬ 
resents  multiple  ramp  and  hold  phases.  For  exam¬ 
ple,  a  wafer  at  room  temperature  is  first  heated  up 
to  and  held  at  500°C  for  a  prespecified  duration 
and  then  ramped  up  with  a  prespedfied  ramp  rate 
and  held  at  1100°C  for  a  prespecified  duration,  af¬ 
ter  which  the  wafer  goes  through  a  cool  down  phase. 
The  overall  time  frame  is  in  the  order  of  seconds. 
During  such  reaction  recipes,  a  chamber  does  not 
reach  steady-state  over  the  short  durations  when 
the  reference  is  held  constant.  In  fact,  open-loop 
state  responses  to  step  inputs  exhibit  the  different 
time- scales:  e.g.,  fast  wafer  state  responses  versus 
slower  quartz  window  state  responses.  After  feed¬ 
back  control  is  applied,  the  wafer  states  are  steered 
to  their  desired  steady-state  values  over  a  faster 
time-scale  then  the  rest  of  the  states  by  imposing 
integral  action  on  certain  pyrometer  measurements. 
The  premise  of  the  steady-state  analysis  in  this  sec¬ 
tion  is  the  following:  By  clamping  the  pyrometer 
measurements  at  prespecified  steady-state  values, 
the  wafer  spatial  temperature  profile  is  kept  close 
to  the  w'afer  steady-state  profile.  Hence  it  is  crucial 
to  quantify  the  achievable  uniformity  levels  that  the 
model  in  (1)  predicts  when  2  =  0. 


Let  S  denote  a  selection  matrix,  where 
T]  —  S  vT  denotes  the  appropriate  entries 

selected  from  the  wafer  states,  pyrometer  measure- 


ments  and  lamp  inputs,  respectively.  Let  ijo  denote 
the  associated  desired  nominal  values.  For  a  fixed 
parameter  value  wq  and  a  particular  choice  of  5  and 
T]o  and  a  diagonal  nonnegative  weighting  matrix  A  , 
let  the  operating  point  (u,  a:)  be  given  by 


(2) 

In  other  words,  the  operating  point  determined  by 
(2)  minimizes  the  cost  1|A(7?  —  770)  |1  -  The  current 
toolset  allows  the  choices  of  Frobenius  norm  or  peak 
norm  in  the  cost  description.  The  problem  in  (2)  is 
solved  using  Newton- Raphson  iterations  based  on 
quadratic  program  solutions  (Frobenius  norm  case) 
or  linear  program  solutions  (peak  norm  case).  Note 
that,  due  to  the  first-principles  based  derivation  of 
(1),  for  a  fixed  parameter  w  and  an  input  level  u  , 
there  is  a  unique  equilibrium  state  x  . 


By  determining  operating  points  from  (2),  one  can 
answer  critical  chamber  design  specific  questions 
such  as  the  following: 

1)  What  is  the  best  steady-state  uniformity  pre¬ 
dicted  by  the  model  ?  Let  r  £  IR^.  denote  a  con¬ 
stant  reference  temperature,  which  is  typicaDy  in 
a  given  interval  [r,r]  determined  by  the  reaction 
recipes.  The  performance  goal  in  choosing  an  oper¬ 
ating  point  is  to  minimize  the  spatial  wafer  temper¬ 
ature  error  (2  —  rl)  for  a  given  w  ,  where  1  denotes 
a  vector  of  ones.  The  uncertain  constant  parame¬ 
ter  vector  w  (denoting  uncertainties  in  emissivities, 
thermal  conductance,  thermal  mass,  etc.)  is  in  a 
known  polytope  determined  hy  w  <  w  <  w  . 
For  a  given  nominal  tx;o  and  a  reference  r  ,  the  min¬ 
imum  cost  in  (2)  determines  the  nominal  perfor¬ 
mance  limitation.  The  achieved  minimum  unifor¬ 
mity  error  determines  the  performance  lower  bound 
dictated  by  the  plant  model;  moreover,  suggestions 
can  be  made  for  a  possible  chamber  redesign  to  fur¬ 
ther  improve  the  limits  of  performance.  No  combi¬ 
nation  of  feedforward/ feedback  can  achieve  a  nomi¬ 
nal  uniformity  error  smaller  than  this  lower  bound, 

2)  What  is  the  tradeoff  between  wafer  unifor¬ 
mity  and  sensor  uniformity  ?  Since  wafer  states 
are  not  measured,  the  answer  to  1)  above  pro¬ 
vides  pyrometer  reference  values  that  can  be 


used  for  integral  action  control.  For  a  given 
set  of  sensor  locations, .  uniformity  in  pyrometer 
measurements  need  not  imply  wafer  uniformity. 
5)  What  are  the  best  lamp  groupings  for  a  range 
of  r^s  ?  This  question  is  a  special  case  of 
above,  by  merely  grouping  lamps,  and  redefining 
the  model  under  study.  It  has  more  of  a  practical 
significance  in  that  one  can  determine  an  imple¬ 
mentation  cost  versus  achievable  uniformity  trade¬ 
off  in  order  to  justify  the  need  to  install  additional 
actuators  driving  the  lamp  groups  for  control. 
4)  What  is  the  effect  of  minimum  and  maximum 
power  levels  on  steady-state  uniformity  ?  Neces¬ 
sarily,  the  power  levels  should  comply  with  de¬ 
sired  range  of  operation.  The  effect  can  be  seen 
by  changing  u  or  u  in  (2).  Also,  a  tradeoff  over  fea¬ 
sible  solutions  can  be  computed  by  appropriately 
assigning  7/  ,  770  and  A  in  (2)  , 

The  proposed  approach  to  steady-state  analysis 
is  now  illustrated  on  the  generic  RTF  model. 

3.L  Case  Study 

A  simple  uniformity  tradeoff  was  performed  at 
r  =  1000®C  for  two  regions  on  the  wafer.  Let 

zi  denote  the  first  15  wafer  states  and  22  denote 
the  latter  6.  In  other  words,  the  wafer  is  parti¬ 
tioned  into  two  equal  areas:  an  inner  disk  and  an 
outer  annular  region.  The  cost  in  (2)  was  chosen 
A(zi  -  rl) 

(l-AXz2-rl) 
mand  levels  were  restricted  to  [0, 1]  for  each  lamp. 
For  eight  different  values  of  A  6  [0.2, 0.9] ,  the  oper- 
ating  points  were  computed.  The  results  are  plot¬ 
ted  in  Figure  1. 

The  first  strip  in  Figure  1  shows  the  tradeoff  be¬ 
tween  \\zi  -  rl\\2  and  \\z2  -  rll|2  .  The  rightmost 
filled  circle  denotes  the  values  at  A  =  0.2  ;  the  filled 
circles  to  its  left  denote  the  values  at  0.1  increments 
of  A  with  A  =  0.9  corresponding  to  the  leftmost 
filled  circle  in  the  top  strip  of  Figure  1  .  The  sec¬ 
ond  strip  shows  the  associated  lamp  input  levels. 
The  last  two  strips  show  the  spatial  temperature 
uniformities  (zi  -  rl)  and  (22  -  rl)  ,  respectively. 

An  evaluation  of  Figure  1  reveals  some  interest¬ 
ing  chamber  design  specific  properties  at  1000°C 
operation.  The  overall  peak- to- peak  wafer  unifor- 


.  The  admissible  com- 

-  2 


mity  cannot  be  improved  further  than  2®C  when 
cost  is  based  on  Frobenius  norm.  A  confirmation 
of  this  intrinsic  peak-to-peak  limitation  can  be  done 
at  A  =  0.5  and  changing  the  norm  in  the  cost  de¬ 
scription  to  peak- norm.  Since  the  operating  point 
inputs  are  not  violating  the  desired  levels,  increas¬ 
ing  the  maximum  power  levels  will  have  no  effect 
on  improving  steady- state  imiformity.  The  inner 
disk  uniformity  can  be  marginally  improved  at  the 
expense  of  the  annular  region  uniformity,  which  re¬ 
sults  in  a  considerable  2.5®C  tilt.  The  spatial  co 
ordinates  and  the  number  of  lamps  is  in  fact  a  key 
factor  for  further  improvement.  By  easily  generat¬ 
ing  such  tradeoff  curves  for  different  chamber  pa¬ 
rameters  and  geometries,  a  crucial  control  specific 
assessment  can  be  made. 


Figure  1:  Steady-state  uniformity  tradeoff  at 
lOOO^C  for  two  regions  on  the  wafer  in  the  generic 
RTF  model. 

For  A  =  0.5  (i.e.,  uniform  weighting  over  the  21 
wafer  states)  ,  a  similar  uniformity  study  based  on 
(2)  was  done  for  r  =  500, 800,  llOO'C  .  The  op- 
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Figure  2:  Optimal  u  for  best  steady-state  wafer 
profile  at  500°C,  800°C  and  1100°C  (top).  The 
normalized  power  levels  uju^  (bottom). 

timal  power  inputs  achieving  the  best  steady-state 
spatial  wafer  uniformity  is  shown  in  the  top  strip 
in  Figure  2  .  The  normalized  lamp  inputs  (with 
respect  to  the  first  lamp)  are  plotted  in  the  second 
strip  in  Figure  2  .  The  effect  of  grouping  all  of  the 
five  lamps  into  one  actuator  using  the  relative  gains 
in  Figure  2  will  be  illustrated  later  in  Section  4.1  . 

4.  TR-4NSIENT  PERFORMANCE 

As  discussed  in  Section  3  ,  the  performance  lim¬ 
its  of  a  particular  chamber  design  cannot  be  deter¬ 
mined  by  steady-state  analysis  only.  This  section 
focuses  on  designing  a  baseline  controller  to  provide 
a  dynamic  response  to  evaluate  achievable  transient 
uniformity. 

Let  (uo,io)  be  an  operating  point  determined  by 
solving  a  problem  of  the  type  in  (2)  .  Let  Pq  denote 
the  transfer-matrix  of  the  small-signal  equivalent  of 


(1)  about  (uo,  xq)  from  inpjit  u  to  sensor  y  . 

Due  the  physical  properties  of  the  process  chamber, 
Pq  is  minimum  phase,  stable  and  strictly  proper. 
The  generic  RTP  process  chamber  model  used  for 
illustration  is  relative  degree  two.  For  speeding  up 
design  related  iterations  and  for  keeping  the  or¬ 
der  of  the  feedback  controller  low,  Hankel-singular- 
value  reduction  is  performed  on  the  high-order 
Pq  to  obtain  P  .  All  stabilizing  controllers  for 
P  in  the  unity-feedback  configuration  is  given  by 
C  =  Q{I  —  PQ)~^  ,  where  Q  is  a  stable  transfer- 
matrix  denoting  the  free  design  parameter.  A  sim¬ 
ple  design  approach  is  adopted  by  solving  PQ  =  F 
for  Q  ,  where  F  denotes  a  desired  closed-loop  ref¬ 
erence  to  plant  output  transfer-matrix  with  rela¬ 
tive  degree  at  least  that  of  P  .  Clearly,  specify- 
ing  F  to  be  diagonal  with  nonzero  entries  corre¬ 
sponds  to  a  decoupling  design  (hence  necessarily, 
Hy  <  Tiu)  .  Imposing  i^(0)  =  I  results  in  integral 
action  in  all  output  channels,  hence  a  step  tracking 
design.  Imposing  an  additional  F'(0)  =  0  results 
in  a  ramp-tracking  design.  For  chamber  design  it¬ 
erations,  the  adopted  approach  was  a  decoupling 
ramp-tracking  design,  where  the  only  design  free¬ 
dom  was  the  bandwidth  determined  by  F.  The  re¬ 
sulting  controller  C  was  simulated  in  feedback  with 
the  original  nonlinear  model  in  (1)  with  smtable  off¬ 
sets  and  actuator  saturation  limits.  The  resulting 
dynamic  responses  complement  the  tradeoff  stud¬ 
ies  in  Section  3  in  rating  a  chamber  design  since 
achievable  ramp-rates  and  step  response  limits  can 
be  determined  with  the  available  actuation  limits. 

4-i-  Case  Study 

A  simple  comparative  study  was  performed  on  the 
generic  RTP  process  chamber  model,  where  the  de¬ 
sign  goal  was  to  ramp  with  a  rate  of  50®C/s  from 
500®C  to  1100°C  using  a  ramp-tracking  controller. 

An  operating  point  of  (1)  was  determined  at 
1100°C  by  solving  (2)  for  minimum  wafer  spatial 
uniformity.  The  associated  lamp  input  level  is 
shown  in  the  top  curve  in  the  top  strip  of  Figure  2  . 
Using  the  same  cost  criterion,  the  operating  point 
at  500°C  was  also  computed  to  start  the  nonlinear 
dynamic  simulation  from  steady-state  conditions. 

The  small-signal  equivalent  about  the  1100°C  op¬ 


erating  point  was  computed  (i.e.,  116-state  Pq) 
For  illustration  purposes,  pyrometer  target  loca¬ 
tions  n  =  {1}  and  is  =  {1,5,10,15,20}  were  cho¬ 
sen  to  compare  a  1-input  1-output  control  design 
performance  with  that  of  a  5-input  5-output  one. 

Let  and  Sx^  denote  the  selection  matrices  asso¬ 
ciated  with  the  index  sets  ii  and  is  ,  respectively. 
Let  Pi  denote  the  15-state  reduced  order  model  ob¬ 
tained  from  5,jPo^iioo  ,  where  ^noo  denotes  the 
normalized  lamp  levels  at  1100°C  shown  in  bot¬ 
tom  plot  in  Figure  2  .  In  other  words.  Pi  is  a 
reduced-order  1-input  1-output  model  by  grouping 
all  5  lamps  into  one  and  using  only  the  first  pyrom¬ 
eter  location.  Similarly,  let  Ps  denote  the  15-state 
reduced  order  model  obtained  from  Si^Po. 
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Figure  3;  Closed-loop  response  using  Ci  and  the 
nonlinear  model  in  (1).  Center  temperature  re¬ 
sponse  (top).  Individual  lamp  inputs,  with  fixed 
ratios  (center).  Wafer  spatial  uniformity  with  re¬ 
spect  to  the  center  temperature,  (z-zil)  (bottom). 

The  associated  Tinput  1-output  controller  Ci  and 


S-input  5-output  Cs  were  detenniued  by  solving 
^\Q\  —  f  3^d  P5Q5  =  fl  ,  where  /(■s)  = 
to  guarantee  a  decoupling  ramp-tracking  controUer. 
The  resulting  feedback  controllers  were  simulated 
with  (1)  by  introducing  appropriate  offsets,  satura¬ 
tion  and  initializations.  The  results  are  shown  in 
Figure  3  and  Figure  4  . 
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Figure  4:  Closed-loop  response  using  C5  and  the 
nonlinear  model  in  (1).  Center  temperature  re¬ 
sponse  (top).  Individual  lamp  inputs  (center). 
Wafer  spatial  uniformity  with  respect  to  the  cen¬ 
ter  temperature,  (z  -  z^l)  (bottom). 

Since  integral  action  requires  P(0)(5(0)  =  I 
and  the  actuation  effort  at  steady-state  is  related 
to  Q(0)  ,  good  conditioning  of  P(0)  is  generally 
adopted  as  a  guideline.  While  this  guideline  is  use- 
fiff  in  weeding  out  cases  with  order  of  magnitude 
differences,  it  is  not  a  complete  answer  to  rank  ac¬ 
tuator  groupings  and  sensor  locations  used  in  feed¬ 
back.  A  systematic  procedure  was  discussed  us¬ 
ing  DC-gain  matrices  from  {w,u)  to  (z,y)  to  ex¬ 
tend  the  conditioning  argument  to  a  best  worst- 


case  closed-loop  performance  in  (Kosut  and  Kab¬ 
uli,  1995).  The  approach  in  Section  4  and  the  dy¬ 
namic  responses  as  in  Figure  3  complement  such 
condition  number  based  approaches  in  choosing 
the  pyrometer  locations  and/or  lamp  groupings'! 

5.  CONCLUDING  REMARKS 

First-principles  based  nonlinear  dynamic  model 
of  a  generic  rapid  thermal  processing  chamber  is 
used  to  illustrate  a  solution  to  a  control/structure 
interaction  problem.  A  computational  toolset 
is  developed  to  rank  a  particular  chamber  de¬ 
sign  according  to  steady-state  wafer  uniformity 
analysis  for  constant  inputs  as  well  as  achiev¬ 
able  transient  uniformity  while  ramping  using  a 
baseline  integral  action  controller.  For  differ¬ 
ent  chamber  design  parameters,  the  user  can  de¬ 
termine  operating  points,  perform  steady-state 
uniformity  tradeoffs,  derive  reduced-order  small 
signal  equivalent  models,  design  step-  or  ramp- 
tracking  feedback  controllers  and  observe  a  base¬ 
line  dynamic  response  for  a  given  chamber  design. 
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Abstract:  The  Proper  Orthogonal  Decomposi¬ 
tion  (POD),  also  called  snapshot  method[l,  2],  is 
a  nonlinear  model  order  reduction  method  where 
reduction  of  the  size  of  the  state  space  is  achieved 
using  a  singular  vEdue  decomposition  of  a  matrix 
of  snapshots  of  the  state  vector.  This  method  has 
been  shown  to  work  well  for  a  simple  lumped  phys¬ 
ical  model  of  a  rapid  thermal  processing  (RTF) 
chamber.  Although  a  substantial  reduction  of  the 
number  of  states  is  achieved,  some  numerical  com¬ 
putations  still  need  to  be  performed  in  the  high¬ 
dimensional  state  which  is  computationally  expen¬ 
sive.  In  this  paper  we  will  demonstrate  how  this 
can  be  avoided  using  aggregation  of  terms,  result¬ 
ing  in  a  significamt  model  simulation  speed  un- 
provement. 

Notation:  The  notation  is  used  for  both  vec¬ 
tors  and  matrices  where  svtry  individual  eltmcni  is 
raised  to  the  power  k  ,  except  for  the  case  of  invert¬ 
ible  matrices  where  ik  =  —  1  it  denotes  the  regular 
matrix  inversion.  Likewise,  the  .  notation  in  &ont 
of  operators  on  vectors  means  an  elementwise  op¬ 
eration  such  as  (a,/6)»  =  ai/bi  ,  or  (a.6)i  =  a,*6|  . 
We  sometimes  use  the  notation  1  for  a  vector  of 
ones.  For  a  vector  of  indices  cr  ,  we  will  write 
=  IltyX  for  a  vector  consisting  of  the  elements 
of  X  indexed  by  the  index  vector  tr  . 


1  Introduction 

We  have  recently  reported  succe^ul  results 
on  nonlinear  model  reduction  using  the  Proper 
Orthogonal  Decontposition  (POD)  or  snapshot 
method  [3,  4]  .  The  snapshot  method  was  capable 
of  providing  a  significant  reduction  in  the  num¬ 
ber  of  integrators.  However,  the  results  did  not 
harness  the  full  potential  of  computational  savings 
promised  by  the  snapshot  method  in  the  sense  that 
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seardbi  Projects  Agency  (ARPA)  under  Contract  No. 
N00014-94-C-0187. 
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one  nonlinear  term  still  had  to  be  computed  in  the 
original  high-dimensional  state  space  coordinates, 
rather  than  in  the  low-dimensional  state  space  of 
the  reduced-order  model. 

More  specifically,  we  applied  the  snapshot  method 
to  the  following  simplified  high-order  lumped 
physical  thermal  model  equations: 

X  =  M [AcX  -h  i4rX^  -h  flii  -f  C]  (1) 

Here,  M  is  the  (diagonal)  thermal  mass  matrix, 
Ac  and  Ar  represent  the  thermal  conductivity  and 
radiative  exchange  matrices,  respectively,  B  is  the 
lamp  power  input  matrix,  ti  €  is  the  lamp 
power,  and  C  is  a  constant  term  determined  by 
the  steady-state  with  lamp  power  equal  to  zero 
(see  [5]  for  details). 

The  basis  for  the  snapshot  method  is  formed  by  a 
singular  value  decomposition  of  the  snapshot  ma¬ 
trix  X  : 

X  =  arv'’-  =  p.  V,]  [  “  J  K  vf  1 

Here,  X  =  (*1 , . . . .  )  where  the  Xi  (:  = 

1, . . Af)  are  the  state  snapshots  at  time  U  and 
where  Ui,  Vi  and  S<  (t  =  1,2)  are  determined  by 
a  suitable  truncation  of  E  to  the  first  n  singular 
values.  If  the  state  dimension  is  N  then  X  is  a 
(N  X  M)  matrix. 

Assuming  that  the  snapshots  are  representative  for 
the  state  vectors  that  occur  during  model  Simula^ 
tion,  we  can  approximate  xby  x  =  U\z  where  z  is 
obtained  by  substituting  this  term  in  (1)  : 

i  =  Ui  M~^  \AcU\,z  •¥  At{U\z)^  •¥  Bu-‘rC\ 

=  A\z  +  KiUiz)*  +  B'u  +  C  (2) 

where 

A'  -  €  R'”'" 

A'  =  U^M'^Ar 

B'  = 

cr  =  ufM-^ceRT 


^1 


It  is  obvious  that  when  E2  =  0  ,  then  x  =  i  . 
When  E  is  nonzero  but  small,  x  will  be  close  to  x  . 
Although  the  exact  nature  of  this  approximation  is 
still  under  study,  the  method  seems  to  work  quite 
well  for  these  particular  model  equations. 

The  number  of  computations  required  to  simulate 
(2)  is  significantly  less  than  that  of  (1)  .  How¬ 
ever,  computation  of  the  term  A^{Uxz)'^  remains 
a  problem,  especially  if  iV  is  large  (typical  values 
are  //  =  5000  and  n  =  10  ^  ,  This  implies  that 
potentially  there  is  room  for  improvement  in  simu¬ 
lation  speed  by  a  factor  of  several  hundreds  if  this 
term  can  be  reduced  to  a  lower-order  expression. 
It  turns  out  that  this  is  indeed  possible,  flow  it 
can  be  achieved  will  be  explained  in  the  following 
sections. 

Caveat:  If  the  model  (1)  is  obtained  from  a  PDE 
via  a  Finite  Elements  approximation,  then  it  is 
possible  to  avoid  the  problem  addressed  in  this 
paper.  However,  it  is  often  the  case  that  (1)  is 
obtained  from  a  discrete  lumped  model  with  no 
PDE  available,  in  which  case  the  method  given 
here  is  appropriate. 


2  Truly  Reduced- Order 

Modeling  by  Aggregation 


One  obvious  way  of  avoiding  the  computation  of 
(Uiz)^  and  its  premultiplication  by  is  to  ex¬ 
pand  in  powers  of  elements  of  the  vec¬ 

tor  z  .  Unfortunately  the  number  of  coefficients 
can  be  extremely  large.  For  instance,  if  n  =  10  , 
every  element  of  {UizY  contains  many  thousands 
of  coeflScients  which  makes  this  method  very  im¬ 
practical. 

Instead  of  analytically  computing  the  coefficients, 
one  could  of  course  compute  a  polynomial  approx¬ 
imation  of  Ar(Uiz)^  using  a  least  squares  fit  of 
the  form  Qr<f>{z)  where  ^  is  a  vector  of  polynomial 
terms  in  the  elements  of  z  .  We  have  tried  this, 
using  snapshot- based  values  of  z  to  construct  ma¬ 
trices  of  regression  variables,  with  the  choice 

♦(.)  =  [  1  1’’  . 

In  this  least  squares  fit  we  used  the  term  A^(Uiz)^ 

^For  practical  reAsons,  we  have  limited  our  fuU-ordex 
model  to  iV  =  116  nodes.  However,  we  believe  that 
the  same  methodology  would  work  for  much  higher  model 
orders. 


instead  of  A^x^  since  the  former  is  only  an  approx¬ 
imation  of  the  latter,  the  exact  term  that  we  wish 
to  approximate. 


[A;z* . A^X^]  »er 


Unfortunately, 
using  the  term  ©r  |l  1 
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•  ••  -Af  . 

,3T  ^4' 

as  a  substitute  for  A^iUiz)^  resulted  in  an  ex¬ 
tremely  numericaUy-unstable  simulation.  Closer 
inspection  reveaJed  that  the  approximation  was  at 
best  reasonable  only  close  to  the  region  where  the 
snapshots  had  been  taken.  Obviously,  expansion 
in  straight  polynomial  terms  is  not  very  robust. 


The  next  approach  that  could  have  been  tried 
would  be  to  replace  the  regression  vectors  1,  z,  z^ 
and  z^  with  a  variety  of  purely  4th  order  terms 
and  try  again.  Since  there  are  many  such  terms, 
this  would  have  resulted  in  an  endless  search  for 
the  optimal  set  of  regression  variables. 

Fortunately  it  is  not  necessary  to  do  this.  It  turns 
out  that  there  is  a  very  natural  and  convenient 
choice  of  regression  variables  that  works  much  bet¬ 
ter,  namely  one  of  the  form  (HffUiz)^  where  (t  is  a 
suitable  index  vector.  If  (t  is  chosen  as  the  vector 
of  node  indices  corresponding  with  the  nodes  that 
undergo  most  of  the  excitation  from  the  individ¬ 
ually  perturbed  lamp  power  inputs  such  as  the  5 
lamp  temperatures  and  5  evenly  distributed  wafer 
temperatures,  complemented  with  a  single  win¬ 
dow  amd  showerhead  temperature,  a  near-perfect 
fit  0r(n<y£/iz)^  of  is  obtained.  For  a  physical 
interpretation  of  these  variables  we  refer  to  Fig¬ 
ure  1  where  a  diagram  of  the  generic  RTF  model 
is  displayed  [5].  A  diagram  of  the  RTF  cham¬ 
ber  with  the  selected  nodes  is  shown  in  Figure  1  . 
Using  this  term  in  the  actual  simulation  yielded  re¬ 
sults  that  were  almost  indistinguishable  from  that 
of  the  original  low-order  model  equations,  requir¬ 
ing  much  less  simulation  time.  An  expleination  of 
how  this  is  possible  follows. 

With  discretized  thermal  systems  where  the  states 
represent  the  node  temperatures,  oftentimes  many 
states  display  a  similar  behavior.  More  precisely, 
let  us  assume  that  the  state  vector  x  can  be  ap¬ 
proximated  as  X  =  Sxf,  where  is  an  index  vector 
of  certain  representative  states,  and  5  is  a  ma¬ 
trix  with  only  1  nonzero  element  in  every  row. 
In  other  words,  all  temperatures  are  grouped  and 


Figure  1:  Schematic  of  the  axisymmetric  generic 
RTF  system 


within  every  group  the  signals  differ  only  by  a 
constant  fau:tor.  We  will  refer  to  this  property  as 
the  property  of  aggrtgaiion^  or  by  saying  that  x  is 
aggrtgaiti.  Under  the  assumption  of  aggregation, 

«  A;iSx^)*  =  A;s*xt  «  A^S^iU^Uizy 

This  suggests  that  it  is  possible  to  make  a  good 
least  squares  fit  using  the  snapshot  data  of  the 
form 

AliUiz)*  «  QrOlaUiZ)* 

This  approximation  is  orders  of  magnitude  more 
efficient  to  compute  since  the  number  of  columns 
of  0r  is  much  less  than  that  of  .  Note  that 
the  assumption  of  aggregation  was  made  only  for 
the  purpose  of  finding  a  suitable  set  of  regression 
vectors,  amd  that  it  has  no  direct  implication  for 
the  properties  of  the  reduced-order  model.  How¬ 
ever  in  cases  where  it  does  not  quite  hold,  the  least 
squares  fit  may  still  be  able  to  correct  the  situation 
since  it  simply  provides  the  best  fit  on  the  set  of 
regression  variables,  regardless  of  the  underlying 
assumptions. 

We  would  like  to  emphasize  that  this  approach  re¬ 
quires  only  minor  engineering  judgement  regard¬ 
ing  the  selection  of  nodes  that  are  used  •  in  the 
least  squares  fit.  This  makes  the  method  physi¬ 
cally  pleasing,  while  it  is  also  quite  objective  and 
requires  no  quantitative  model  tuning  as  is  the 
case  with  aggregation-based  model  reduction. 


3  Application  to  a  More 
Gener2J  Model  Structure 

The  model  equations  (1)  are  unduly  restrictive, 
and  it  is  necessary  to  investigate  if  true  reduced- 
order  modeling  can  be  applied  to  more  general 
RTF  chamber  models.  For  this,  we  will  investigate 
the  following  set  of  model  equations: 


X 

=  M(x)-^[-C{x). 

-ie(x)  +  5tt-»V(r)] 

R(x) 

=  A?  ((1  -  fi{x)).x 

W(x) 

=  Ktix-T,) 

C(x) 

=  Ac6iz%{T{Aepx))A^x 

T(Acpx) 

—  lc(AcpX)./(  AepX) 

M(x) 

=  diag(m.n(2)./*) 

(3) 

Here,  the  variables  have  the  following  interpretat- 


tion: 

X 

State  vector  (node  temperatures)' 

R 

Radiation  loss  term 

W 

Wall  loss  term 

C 

Conduction  lo^  term 

M 

Thermd  ma^  matrix 

0 

Black-body  radiation  fraction 

Acy  Aep 

Sparse  matrices  representing  branch 

Ky  T) 

conductivity 

Scalar  polynomials,  used  for  branch 

m,  Kly  Ti 

conductivity  and  specific  heat 
Constant  vectors 

The  difference  between  (3)  and  the  simple  model 
(1)  is  twofold:  several  matrices  have  been  made 
temperature-dependent,  and  the  radiative  trans¬ 
fer  has  been  separated  into  a  two  frequency-baind 
region  (related  to  auid  A^)  . 

We  take  the  same  approach  as  in  section  2  :  write 
X  =  Uiz  ,  then  making  least  squares  fits  to  approx¬ 
imate  several  terms  to  reduce  all  computations  to 
low  order  using  aggregation.  Due  to  the  depen¬ 
dence  of  the  thermal  mass  matrix  M{x)  on  the 
state  it  is  more  convenient  to  keep  the  M(z)  term 
to  the  left  hand  side: 

M{x)x  =  -C(z)  -  R{x)  +  Hu  -  W{x) 
Substituting  z  =  i/iz  , 

M{Uiz)Uiz  =  -C{Uiz)  -  R{Uiz)  +  Rtt  -  W{Uiz) 

Under  the  assumption  of  aggregation,  we  can  re¬ 
duce  the  size  of  the  computations  by  selecting  the 
representative  states  indexed  by  tr  on  both  sides: 


n^M{Uiz)Uxz  =  -^C4UxzyiU{Uiz)^B^u^W^iUiz) 


Now  we  can  focus  on  approximation  of  the  right 
hand  side  terms.  Only  the  terms  Rc{^)  and 
require  some  further  investigation. 

At  this  point  some  remarks  on  the  compar¬ 
ison  between  the  generic  reduced-order  model 
equation  (3)  and  that  of  the  simplified  generic 
model  (2)  are  in  order.  With  (2)  ,  we  were  able 
to  get  2  to  the  left  hand  side  by  premultiplying 
with  Ui  .  By  virtue  of  the  properties  of  the 
fact  that  U\  is  composed  of  the  set  of  (orthog¬ 
onal)  left  singular  vectors,  this  premultiplication 
represents  a  nicely  balanced  way  of  obtaining  the 
reduced-order  model  equations.  However,  with  (3) 
we  have  to  perform  a  left- inversion  of  the  term 
n<rAf(t/ir)[/i  to  achieve  the  same.  The  numerical 
conditioning  of  this  inversion  depends  strongly  on 
the  selection  a  and  may  require  adding  in  some 
more  nodes. 

We  will  therefore  end  up  with  two  orders:  one 
for  the  number  of  integrators  (dimension  of  z)  , 
and  one  for  the  number  of  selections  (dimension  of 
&)  to  approximate  the  nonlinear  behavior,  and  to 
precondition  the  left- inversion  of  (3)  .  Although 
it  may  seem  as  if  the  determination  of  a  and  the 
related  left-inversion  introduce  too  much  subjec¬ 
tivity  into  this  approach,  it  also  gives  us  a  unique 
chance  to  extract  exactly  that  behavior  that  fo¬ 
cuses  on  specific  components  of  the  system  (wafer 
temperature)  that  are  crucial  for  control  design 
purposes.  Much  less  weight  is  attached  to  other 
states  that  are  only  required  to  support  the  dy¬ 
namic  behavior  of  the  reduced-order  model,  they 
are  approximated  in  a  fairly  crude  way. 

The  Radiative  Loss  Term  R^{x) 

The  selected  components  of  iJ(x)  axe  given  by 
i?,(x)  =  ((1  -  ^(*)).x")  + 

n.4  {^(*)*'‘)  + 

where  the  subscript  ^  indicates  selection  of  the 
components  indexed  by  <r  .  The  expression  for 
0{u)  where  u  is  a  scalar  is  of  the  form 

k  k  k 

0{n)  =  cip(-)  +  C2p(2-)  +  C3p(3-) 

p(u)  =  c~“(u®  +  3u^  +  6u  +  6) 

Based  on  this,  we  will  try  the  following  least 
squares  fits: 


where  the  values  for  x  and  are  taken  from  the 
snapshots. 


The  Conduction  Loss  Term  C{x) 

Reduction  of  C{x)  is  the  hardest  part  of  the  model 
reduction  procedure.  It  should  be  noted  that  the 
temperature-dependence  reflected  by  r(Aq>x)  is 
generally  quite  small,  so  in  first  instaince  one  could 
approximate  r(Acpx)  by  a  constant  and  simply 
precompute  the  (n  x  n)  matrix  Ac  diag(r)A^J/i 
after  which  the  job  is  completed.  This  might  work 
quite  well,  especizJly  since  the  radiation  term  is 
the  dominant  factor  in  RTF  chamber  models.  In 
cases  where  this  assumption  is  not  justified  one 
could  proceed  as  follows. 

First,  we  have  to  notice  that  Ac  and  Acp  are  di¬ 
rectly  related  to  the  branches  between  the  nodes. 
More  precisely,  the  columns  of  Ac  contain  zeros 
everywhere  except  for  two  elements  which  have 
the  values  1  and  —1  .  The  term  A'^x  is  therefore 
represents  temperature  differences  between  nodes. 
Similarly,  Acp  contains  zeros  everywhere  except  for 
two  elements  that  both  have  the  value  1,  therefore 
it  represents  average  temperatures  of  nodes. 

Because  of  the  physical  interpretation,  we  cannot 
select  components  based  on  a  .  Instead,  we  have 
to  group  the  set  of  branches  between  nodes  into 
subsets  that  display  a  similar  behavior  in  the  same 
way  as  what  we  did  for  the  selection  of  nodes. 
This  results  in  a  selection  r  by  picking  one  repre¬ 
sentative  element  from  every  subset.  The  natural 
consequence  of  our  approach  is  to  fit  the  following 
model  approximation: 

HrAe  diag(r(A<pr))Afi » 

Gc  diag(r(nrAcpi))n^Arx  (4) 

where  the  matrices  Ac,r  and  Acp,T  have  been  con¬ 
structed  in  the  same  way  as  Ac  and  Acp  ,  based 
on  branches  between  groups  of  signals  instead  of 
individual  signals. 

Doing  this  requires  more  engineering  judgement 
than  we  would  like  to  use,  therefore  we  will  first 
try  a  simpler  approach.  Assume  that  the  conduc¬ 
tivity  term  of  a  branch  between  two  nodes  zi  and 
X2  is  described  by  p((2;i  +  X2)/2)(xi  —  X2)  where 
p(x)  is  a  polynomial.  This  corresponds  with  one 
branch  element  of  (4)  .  Assuming  that  the  tem¬ 
peratures  xi  and  X2  are  approximately  the  same, 
and  that  p(x)  can  be  approximated  by  a  second 


order  polynomial  p(x)  =  Po  +  +  P2^^  i  we  can 

write 

pi{xi  +  X2)/2)ixi  -  12)  «  Cpo  +  PlXl  +  P2Xi)Xl 

-(P0+Pi*2+J>2*2)®2 

Here,  we  will  substitute  the  elements  of  for 
x\  and  X2  for  all  branch  conductivity  approxima¬ 
tions  of  this  form.  Again,  we  rely  on  the  least 
squares  fit  to  compensate  for  any  approximation 
errors  caused  by  our  simplifying  assumptions. 


models  were  obtained  for  different  choices  of  snap¬ 
shots  and  model  orders.  These  models  were  then 
validated  on  the  nominal  dataset  which  is  stochas¬ 
tically  independent  of  the  dataset  used  for  model 
reduction,  and  which  therefore  is  a  legitimate  val¬ 
idation  dataset.  The  results  can  be  summarized 
by  stating  that  a  12th  order  model  based  on  70 
snapshots  resulted  in  a  2  degrees  RMS  validation 
error  which  is  an  excellent  result.  This  was  based 
on  a  selection  of  5  lamp  temperatures,  5  wafer 
temperatures,  1  showerhead  and  1  window  tem¬ 
perature  (12  elements). 


4  Results 


The  Generic  Model 


The  Simplified  Generic  Model 

We  used  830  samples  of  the  same  dataset  as  was 
described  in  [3,  4]  for  the  selection  of  snapshots. 
A  selected  set  of  signads  of  this  dataset  (lamp, 
wafer,  showerhead  and  window  temperatures)  is 
depicted  in  Figure  2  ,  where  the  different  char¬ 
acteristics  of  these  four  groups  are  clearly  recog¬ 
nizable.  The  input  for  the  dataset  consists  of  a 


Figure  2:  The  PRBS-perturbed  nominal  trajec¬ 
tory 

nominal  input  sequence  to  which  a  PRBS-based 
input  disturbance  was  added  for  persistent  exci¬ 
tation.  This  disturb amce  can  be  described  as  the 
sum  of  four  individual  PRBS  sequences  where  each 
sequence  has  a  bandwidth  based  on  the  dominant 
time  constant  of  one  of  the  four  signal  groups, 
with  every  component  of  the  five  lamp  power  in¬ 
puts  excited  independently.  Using  the  simplified 
generic  model  reduction  described  by  (2)  ,  several 


The  more  difficult  task  of  reducing  the  generic 
model  was  accomplished  in  much  the  same  way 
without  great  difficulty.  The  model  order  used 
was  12,  while  the  selection  cr  was  based  on  5  lamp 
temperatures,  5  wafer  temperatures,  5  showerhead 
temperatures,  1  window  and  1  wall  temperature 
(17  elements).  The  5  extra  selections  significantly 
improved  the  simulation  of  the  other,  non-selected 
wafer  and  showerhead  temperatures.  Figures  3 
and  4  show  the  simulation  results  of  the  21  wafer 
and  28  showerhead  temperatures,  respectively. 


Figure  3:  Wafer  temperatures  predicted  by  the 
reduced-order  model 


I 


1 
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Figure  4:  Showerhead  temperatures  predicted  by 
the  reduced-order  model 


The  most  important  result  is  that  the  wafer  tem¬ 
peratures  are  simulated  with  an  error  that  is  gen¬ 
erally  less  than  2  degrees  which  is  an  excellent 
result.  There  are  some  olf-center  showerhead  tem¬ 
peratures  that  deviate  more  than  20  degrees,  but 
those  states  are  only  of  secondary  importance  for 
our  model  reduction  effort.  Instead,  it  is  shown 
that  we  can  model  the  wafer  temperatures  quite 
well  with  a  low-order,  nonlinear  model  by  making 
appropriate  signal  selections.  This  type  of  model 
reduction  can  be  extremely  useful  for  the  imple¬ 
mentation  of  controllers,  by  providing  on-line  non¬ 
linear  model  estimates  of  the  temperatures. 


5  Summary  and  Conclusions 


We  have  applied  the  snapshot  method  to  the 
generic  RTF  chamber  model  and  have  demon¬ 
strated  how  the  numerical  eflBciency  can  be  im¬ 
proved  substantially  by  reducing  the  nonlinear 
terms  using  least  squares.  The  model  reduction  is 
based  on  a  selection  of  singular  values  for  the  num¬ 
ber  of  integrators,  and  a  selection  of  aggregated 
states  for  reduction  of  the  nonlinear  terms.  Some 
engineering  judgement  is  required  for  a  proper  se¬ 
lection.  When  this  is  done  correctly,  low-order 


nonlinear  models  are  obtained  that  are  capable  of 
predicting  several  states  corresponding  with  nodes 
near  the  ones  that  were  selected  for  aggragation 
very  accurately.  Although  the  precise  nature  of 
the  approximations  is  a  topic  for  further  research, 
the  approach  seems  intuitive  and  very  promising 
for  model  reduction  of  high  order  lumped  physical 
models. 
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ABSTRACT 

Some  new  results  in  learning  feedforward  control  are  pre¬ 
sented  in  this  paper.  The  proposed  procedures  are  appIico6/c 
to  processes  with  or  without  feedback  control.  Under  the 
ideal  situation,  this  discrete- time  Learning  Control  (LC) 
scheme  can  perfect  the  task  with  one  repeat.  Convergence 
analyses  are  also  included  for  noisy  and  imprecise  learn¬ 
ing.  New  algorithms  are  proposed  to  address  these  noisy 
learning  issues.  Interesting  connections  between  the  pro¬ 
posed  approach  and  iterative  image  deblurring,  LQ  control 
and  Kalman  filtering  are  identified.  With  all  the  required 
process  knowledge  readily  measurable,  the  proposed  scheme 
is  relatively  simple  to  implement  and  can  be  used  as  an 
on-site  feedforward  tuning  tool.  Encouraging  simulation 
results  are  included. 


1  Introduction 

It  U  well  established  that  a  properly  designed  feedforward  (FF) 
control  complement  feedback  (FB)  control  by  promoting 
non- delay  and  anitcipatory  actions,  leading  to  superior  track¬ 
ing  or  distxirbance  rejection.  Unlike  the  traditional  FF  design 
wldch  is  usually  based  on  analytical  modeling,  therefore  often 
difficult  to  tune,  the  Learning  Control  (LC)  approach  described 
is  suitable  for  on- Jitc  fantn^  as  it  can  learn  from  the  past  experi¬ 
ences.  (Many  manufacturing  tasks  are  repetitive,  task-oriented, 
and  are  potential  applications  of  LC). 

Unlike  some  of  the  published  works,  e.g.,  (l,  2],  which  use  pro- 
porlional,  proportional  +  derivative,  or  state  errors  for  itera¬ 
tive  learning,  this  paper  focusses  on  (discrete- time)  linear  time- 
invariant  (LTI)  predictive  control  type  techniques  which  can  lead 
to  rapid  convergence.  Under  ideal  situations,  this  LC  scheme 
can  perfect  the  tracking  task  with  one  repeat  (Section  2),  The 
only  required  knowledge  is  the  impulse  responses  which  can  be 
readily  measured  on-site.  In  that  regard,  our  scheme  is  some¬ 
what  similar  to  the  one  in  [3].  However,  unlike  [3l,  otir  results 
are  applicable  to  multi-input,  multi-output  (MIMO)  cases  and 
are  less  restrictive  in  assumptions.  It  can  also  be  shown  that  our 
results  lead  to  the  main  convergence  results  in  [3].  Convergence 
analyses  for  noisy  and  imprecise  cases  (Section  3)  are  included, 
and  new  algorithms  are  proposed  to  address  these  issues  (Sec¬ 
tions  4,  5,  6).  Rapid  learning  is  demonstrated  by  the  simula¬ 
tion  of  a  MIMO  process.  Interesting  and  important  connections 
with  iterative  image  deblurring  [7],  LQ  control  and  Kalman  fil¬ 
tering  are  identified.  These  connections  allow  a  unifying  system 
inversion  viewpoint  and  the  sharing  of  algorithmic  ideas  from 
seemingly  unrelated  fields. 

2  The  Basic  Learning  Algorithm 

Consider  a  combined  FB  and  FF  control  configuration  as  shown 
in  Figure  1.  The  basic  LC  scheme  and  its  variants  can  be  de- 

•Work  supported  in  part  by  ARPA  under  AFOSR  Contract  No. 
F49620-94-C-0003. 


Figure  1:  Combined  Feedforward  and  Feedback  Configu¬ 
ration 

scribed  by: 

/(<+i)  _  ;(0  _  Xe(*)  (1) 

where  e^'^  is  the  tracking  error  vector,  *  and  the  FF  signal 
in  the  t-th  iteration  and  K  is  a  general  (linear)  operator  which 
effects  LC.  This  LC  operator,  K,  may  include  time- advance  op¬ 
erations.  ^  Realixing  that  both  /  and  the  command  r  contribute 
to  e,  straightforward  analysis  yields  the  following  relationship  in 
noiseless  conditions; 

/(•+»  =  (/_K-A<)/(')_Krr  (2) 

where  At  is  the  transfer  operator  from  /  to  c,  and  P  is  the 
transfer  operator  from  the  command  r  to  c: 

M  =  -(1  +  0,0.)-*  a,  (3) 

r=  I- (1  +  0,0.)-* 0,0.  (4) 

Notice  that  if  K  is  chosen  as  the  inverse  of  M,  i.e.,  the  transfer 
from  /  to  c,  then  exact  convergence  with  one  repeat  is  possible. 
This  is  because 

K  =  (5) 

/(•+»)  =  (I-M-*M)f^''>-M-*rT 

=  G;*t  (6) 

where  /  s=  O^*  t  is  the  “perfect"  FF  control  signal  for  repro¬ 
ducing  the  desired  trajectory  r.  * 

A  more  relaxed  condition  for  convergence  is  that 

!>.(/  -  KM)\  <  1  (7) 

or  0  <  <  2  (8) 

'Under  certain  conditions,  the  signal,  u/i,  instead  of  the  track¬ 
ing  error  e,  may  be  used  for  iterative  learning,  where  u/s,  is  the 
deviation  of  the  FB  controller  output  from  its  initial  steady  state. 
This  is  because  if  the  FF  is  doing  a  perfect  job  for  the  transient, 
then  the  FB  controller  would  not  have  to  labor. 

^Time- advancing  is  necessary  to  implement  correct  “credit" 
signment  for  learning.  This  is  because  that  /  applied  at  time  f  will 
not  affect  the  process  or  c  until  time  f  +  1  or  later. 

*The  meanings  and  existence  of  <7^*1  interpretation  of 

the  algorithm  when  does  not  exist  are  explained  at  the  end  of 
this  section. 


where  A,  denotes  the  eigenvaiues.  This  relaxed  condition  allows 
the  use  of  a  broader  scope  of  algorithms  ranging  from  cases 
where  is  not  known  exactly  (e.g.,  due  to  measurement 

imprecision  or  process  nonlinearity)  to  gradient  descent  type  of 
optimization  algorithms  described  in  Section  5.  Now,  instead  of 
converging  in  one  repeat,  it  will  take  multiple  repeats  to  achieve 
satisfactory  results.  * 


Computationally,  this  basic  algorithm  involves  the  following: 

r  1 

e{3) 

c(4) 


L  e(N  +  1)  J 
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where  Ai ,  ,  /13 ,  ...h//  la  the  closed-loop  impulse  response  se¬ 

quence  from  /  to  e.  Its  state-8j>ace  equivalent  would  be 

where  [A3*C]  stands  for  a 
state-space  representation  of  the  /  to  e  process.  Matrix  Af  is  a 
time-domain  realization  of  the  transfer  operator,  Af. 


Using  the  impulse  response  representation  directly  facilitates  on¬ 
site  measurement  and  simpler  computation.  Alternatively,  one 
may  replace  the  impulse  responses  in  the  matrix  by  step  re¬ 
sponses  and  solve  for  A5/(<)  =  SJ{i  -hi)  -  6J{t)  instead.  * 

This  procedure  is  validated  using  a  simulated  3x3  LTI  MIMO 
process  stabilized  with  decentralized  PI  controllers.  Figure  7  U- 
Itistrates  the  system  performance  in  tracking  a  ramp  command. 
The  top  plots  show  the  tracking  performance  with  feedback  con¬ 
trol  only;  the  bottom  plots  the  performance  after  one  repeat  of 
the  task.  The  impulse  response  matrix  is  estimated  by  difler- 
endng  the  *^easxired"  step  responses.  The  simulation  is  done 
using  the  graphics  oriented  control  design  soAware  SystemBuild 
and  MATRDCx.  The  solid  lines  indicate  the  command  trajec¬ 
tories  and  the  dashed  lines  the  actual  process  resi>onse8.  It  is 
apparent  that  in  this  idealized  situation,  the  basic  learning  ap¬ 
proach  achieves  perfect  traddng  in  one  learning  repeat.  Issues 
with  noisy  cases  are  addressed  in  Sections  3,  4,  5  and  6.  Before 
closing  this  section,  some  remarks  are  in  order: 

Remark  1: 

In  the  rest  of  the  paper,  the  c/sjj  of  learning  scheme  will  be 
formalized  as: 


/.+»)  =  /<0  +  i/(0  (11) 

6/(0  =  (,2) 

^Noting  that  with  A4  in  the  lower  triangular  form  a«  in  Eq.  (9), 
»•  A4“^.  If  K*  Is  restricted  to  be  lower  triangular,  then  to  is  KM. 
The  A*s  of  a  lower  triangular  matrix  are  the  diagonals.  This  leads 
to  the  main  convergence  result,  Proposition  3.1,  in  (3). 

Certainly,  this  procedure  does  not  prohibit  one  from  using  more 
elaborate  system  identification  techniques.  The  impulse  or  step  re¬ 
sponse  matrices  can  be  generated  by  a  model  obtained  through  para¬ 
metric  identification. 


Figure  2:  FF  LC  via  the  Basic  Learning  Algorithm 
where  is  used  to  represent  the  error  (i.c.,  e^*)  or 

served  in  i-th  task  repeat  with  applied.  This  more  expres¬ 
sive  notation  is  needed  in  Secton  4  where  certain  noise-avera^ng 
learning  algorithms  arc  developed. 

Remark  2: 

The  use  of  assumes  the  existence  of  process  •'right  InverBe" 

[4].  •  A  necessary  condition  is  that  the  number  of  process  inputs 

must  be  no  less  than  the  number  of  outputs.  With  the  existence 
ofG;:\thc  existence  of  for  the  dosed-loop  is  asstued. 

When  the  M  matrix  is  1-dclay,  ^  the  matrix  hj  (or  CB  in  state- 
space  notation)  has  rank  n.  The  Af  matrix  in  Eq.''  (9)  is  in¬ 
vertible.  This  (restrictive)  condition  is  assumed  by  some  of  the 
published  work,  c.g.,  [l,  3).  When  the  total  delay  is  greater 
than  1,  the  Af  matrix  as  is  presented  in  Eq.  (9),  would  be  rank 
dehcient.  In  that  case,  one  can  either  solve  the  equation  as  is, 
but  in  a  lewt-squares  sense,  or  shift  the  error  vector,  e,  on  the 
left-hand-side  so  that  it  would  start  with  t  =  L  +  1* 

In  either  case,  the  notion  of  leasUsquares  is  important  because 
it  allows  flexibility  in  seeking  meaningful,  practical  solutions 
including  the  various  forms  of  constrained  optimization  solutions 
discussed  later.  Constraining  the  magnitude  of  control  is  needed 
when  the  process  is  non-minimsm  phase  or  is  subject  to  tight 
physic^  constraints.  Constraining  the  rate  of  control  is  useful 
to  avoid  control  rate  saturation  and  to  reduce  the  noise  effect. 


3  The  Basic  Algorithm  with  Uncer¬ 
tainties 


In  this  section,  two  types  of  uncertainties  are  considered:  process 


The  concept  of  Lr-dcUy  inversion  by  Massey  and  Sain  [5]  is  of 
direct  rclev^ce  here.  A  discrete-time  process  G,  is  L-delay  right 
invertible  if  it  can  per/ect/y  track  any  given  command  after  E^elm. 
Notice  that  L  is  not  unique  and  the  smallest  L  is  often  of  the  greatest 
interest.  It  is  also  known  that  if  a  process  is  right  invertible  then  it 
must  be  so  with  no  greater  than  p  delays  where  p  is  the  dimension 
of  the  state  space  -  similar  propertv  exisU  for  controllabilily  and 
observability.  The  following  test  (S)  can  determine  the  minimum 
delay  L: 


/  I  .  mvenioie  1/  ona  only 

tj  rank{Mi.)  =  rank(AfL.i)  +  n  where  n  is  the  procesi  output 
dimensten  and  Ml  is  the  (nL  x  ml)  leading  principol  minor  0/ 
M  in  Bg.  (9).  * 

’For  pro<nt»tion.  we  h»ve  wumed  th«t  the  diecrete-time  pro- 

p  “  ‘  .«nplmg.  Thi.  U  reflected  in 

Eq.  (9)  that  the  error  vector  e  .tart,  from  t  =  2.  In  reality  the 
process  could  have  pure  time-delays  and/or  Udelayt. 


.S6 


or  mcAsurcmcnt  noise  and  imprecise  impulse  response  model. 

First  consider  the  effect  of  the  process  noise,  v  (added  to  the 
output  y).  With  the  process  noise,  the  learned  /  obeys  the 
following  recursion: 

/(•+>)  =  (/  -  -  ATr  +  ATv(')  (13) 

where  F  is  the  transfer  operator  from  r  to  £  (i.c.,  e  or  ujb)* 
is  the  process  noise  for  the  t-th  repeal. 

When  K  is  precisely  chosen  as  ,  the  above  reoirsion  reduces 

/'•+>)  =  (14) 

The  output  tracking  error  after  the  (t  +  l)th  repeat  is 

«(*+»  =  r-y(*+»  (15) 

=  EIv('+*  -  v(*)l  (16) 

where  E  =  (/  +  GpGc)~*  i*  the  transfer  function  from  the  pro- 
cess  noise,  v,  to  process  output,  y. 

^oftce  ikAi  any  rtpcaiahU  disturbance  in  v  is  eliminated  by  this 
leaminy  procedure.  However,  non-repeatable  random  noise  is 
amplified  by  a  /actor  oj  2^^^  on  <Ac  standard  deviation. 

Next,  consider  the  effect  of  imprecise  Impulse  response  measure¬ 
ments,  M.  Assuming  K  =  ,  and  KM  =  /  +  yields 


I-KM  =  -Bm  (17) 

and 

/<•+*)  =  (18) 

=  -  KMM-^r[r  -  (19) 

=  -Bm/^‘^  +  (/  +  Bm)GJ>[»--«'^''1  (20) 


=  G-‘  r  -f  Bu{G-'  T  -  /<0)  -  (/  +  Bu)G-' 

(21) 


of  the  learned  /;  another  uses  “soft"  inversion  to  avoid  ampli¬ 
fying  noise.  The  one  based  on  a  Kalman  filter  formulation  is 
described  in  Section  6.  A  noise- averaging  technique  which  can 
be  built  into  the  basic  learning  algorithm  is  described  in  tAtj 
section. 

From  an  optimization  viewpoint,  the  basic  leaoiung  algorithm 
eliminates  the  most  recently  observed  learning  error,  c^*^.  The 
noise-averayiny  /orma/alion,  however,  replaces  e^*^  with  the  sta- 
<i3tica//y  averaged  learning  error  which  takes  into  consideration 
all  the  previous  learning  trials.  Of  course,  as  /  evolves  with 
the  learning  process,  portions  of  the  previouse  learning  errors 
are  due  to  different  /.  After  correcting  for  the  differences  in  / 
using  the  M  model,  it  is  possible  to  keep  track  of  a  statistically 
averaged  learning  error  which  assumes  a  common  /,  but  with 
multiple  realizations  of  the  process  noise  averaged  out.  Choos¬ 
ing  /  according  to  the  following  Statistically  Averaged  Inversion 
(SAI)  algorithm  minimizes  this  averaged  learning  erron 

/(•+»  =  /(•)  _  M->  (£<’>,/(.•  -f.  1))  (23) 

for  i  =  0,1,2,...  .  This  algorithm  has  the  flavor  of  Stochas- 
tie  Approximation,  and  it  can  be  shown  that  the  noise  effect 
on  the  learned  /  is  asymptotically  eliminated  as  learning  con¬ 
tinues  (provided  that  the  model  M  is  known  precisely).  It  is 
noticed,  however,  the  correction  gain  decreases  as  i  increases 
(i.e.,  l/(t  +  1)  =  1,1/2, 1/3,...),  and  the  learning  process  even¬ 
tually  becomes  open-loop.  This  observation  along  with  Exj.  (20) 
suggests  that  in  the  case  where  M  is  not  known  precisely,  part 
of  the  initial  error  on  /  may  remain  as  uncorrected  bias. 

To  strike  a  balance  between  eliminating  the  deterministic  FF  er¬ 
ror  and  minimizing  the  noise  effect,  the  following  Exponentially 
Averaged  Inversion  (EAI)  algorithm  is  in  order. 

(-,£<•),)  (24) 


According  to  Exj.  (20),  the  /  dynamics  represent  a  “stable" 
learning  process  when  Bjv/  is  dissipative,  i.e., 

I>.(Bm)1  <  1  (22) 

(where  A,-  are  the  eigenvalues,)  and  r  and  v  are  finite.  * 

Comparing  £q.  (21)  with  £q.  (14),  it  is  noticed  that  impulse 
response  modeling  error  slows  down  the  convergence  and  also 
increases  the  potential  effect  of  the  noise. 

4  Some  Noise  Reduction  Techniques 

It  is  shown  above  that  process  noise  or  measurement  error  can 
potentially  degrade  the  learning  performance.  This  section  de¬ 
votes  further  attention  to  these  issues. 

First  of  all,  conventional  signal  conditioning  and  noise  Altering 
techniques  should  be  considered  in  practical  implementations. 

Unconventional  statistical  techniques  such  as  ja cl- Int/e  jam- 
pUng  (6)  may  be  vised.  Next,  the  basic  algorithm  may  be 
modified  to  accoimt  for  the  noisy  conditions.  Four  (4)  such 
techniques  are  described  in  this  paper.  Two  are  discussed  in 
Section  5:  one  uses  rate  constraints  to  assure  the  smoothness 

•Notice  that  is  the  “perfect”  FF  control  signal  for  repro¬ 

ducing  r  at  the  process  output. 

•Eq.  (22)  is  the  same  condition  as  Eq.  (7). 

Non- couso/ filtering  can  be  used  because  of  the  “off-line”  nature 
of  LC. 

'*For  example,  fast  sampling  around  a  regular  sample  instant  pro¬ 
vides  a  pseudo-ensemble  of  measurements  from  which  a  good  aver¬ 
age  reading  may  be  derived  without  repeating  the  same  experiment 
multiple  times. 


where  0  <  7  <  1  is  the  exponential  averaging  constant.  When 
7  =  1,  the  algorithm  becomes  the  basic  inversion  algorithm. 

Alternatively,  the  following  variation  allows  one  to  choose  7 
adaptivelyz 


+  — +7 


(»+l) "  '(t‘+l)' 


)£' 


■(•) 


(25) 


Notice  that  the  first  component  in  the  parenthesis  represents 
pure  statistical  averaging,  whereas  the  second  component  is  a 
model  error  correction  term  discounted  by  7  (0  <  7  <  1).  7  cati 
be  chosen  according  to  the  ratio  between  the  excess  ||5|p  and 
l|£|p,  where  the  excess  l|f^|p  is  estimated  against  the  expected 
noise  level  and  therefore  is  an  indication  of  modeling  error.  This 
choice  is  similar  to  the  way  that  the  (optimal)  Wiener  filter  is 
determined. 

Finally,  it  is  noted  here  that  the  7  in  EAI  can  be  used  to  model 
the  an  certainty  in  the  initial  /.  If  one  hats  high  “confidence" 
in  then  small  7  should  be  used.  However,  a  more  precise 

account  of  “confidence"  is  outlined  in  the  Kadman  filter  formu¬ 
lation  described  in  Section  6, 

5  Gradient  Algorithms  &  Constrained 
Optimization 

The  basic  LC  algorithm  and  its  variants  described  so  far  re¬ 
quire  matrix  inversion  at  each  step.  In  optimization  theory,  this 
amovmts  to  descending  along  the  Newton’s  direction.  Often,  di¬ 
rect  system  inversion  may  be  less  desirable  for  practical  reasons. 


In  such  cases,  ‘‘soft**  inversion  via  Optimal  Gradient  Descent  is 
an  alternative  to  exact  inversion. 

The  Exponentially  Averaged  version  of  the  Optimal  Gradient 
Descent  Learning  is  summarized  here.  Assuming  a  post  i-th 
repeat  situation,  /<•>.  «/<•-*>,  and  are  given  where 

is  the  observed  learning  error  with  applied  during  the  i- 
th  repeat.  Also  known  is  the  averaged  learning  error  from  the 
previous  repeat:  The  procedure  is  to  first  update  the 

averaged  learning  error,  i.e.,  to  compute  ♦ 

+  ^£(0^  (26) 


whereas  for  i  -  1  =  0. 

With  this  updated  averaged  learning  error,  the  gradient,  g,  of 
this  error  squared  is  computed  with  respect  to  /,  and  /  is  then 
updated  along  the  negative  gradient  direction  with  the  optimal 
step  size,  p: 


/(»+!) 

(27) 

9 

(28) 

=  (a'gmg'Ki'Mg) 

(29) 

When  -y  =r  1  this  algorithm  becomes  the  (basic)  Optimal  Gra¬ 
dient  Learning  algorithm.  When  7  =  l/(t  +  1),  it  becomes  the 
Statistically  Averaged  version. 

NnmertcaUg  iterating  (infinitely)  many  times  with  Optimal  Gra¬ 
dient  ketween  task  repeats  amoimts  to  exact  inversion.  However, 
exact  inversion  is  often  not  a  good  idea  in  the  face  of  noise  and 
uncertainties  -  a  principle  also  found  in  iterative  image  deblur¬ 
ring  [7]  and  adaptive  signal  processing  [9],  In  such  cases, 
“soft"  inversion  may  be  performed  by  iterating  with  optimal 
gradient  descent  a  finite  number  of  times. 

Figure  3  illustrates  the  pdformance  of  the  Optimal  Gradient 
method  in  the  present  of  process  noise.  The  top  plots  are  the 
FB-only  control  performance  (with  the  solid  lines  indicating  the 
command  and  dashed  line  the  noisy  process  response).  The  bot¬ 
tom  plots  show  the  the  Optimal  Gradient  Learning  results  after 
5  repeats  with  10  numerical  iterations  between  task  repeats. 
Again,  in  this  noisy  case,  the  Optimal  Gradient  learning  is  able 
to  improve  over  the  feedbadc-only  performance.  The  built-in 
noise-avera^ng  mechanism  (as  described  in  Section  4)  and  the 
“soft"  inversion  feature  arc  the  only  noise  reduction  methods 
used  in  this  example.  With  signal  conditioning  and  filtring,  ad¬ 
ditional  protections  can  be  expected. 

In  many  practical  applications,  the  mangnitude  and/or  the  rate 
of  control  are  limited,  e.g.,  by  valve  position  and  slew  rate.  The 
following  outlines  formulations  and  solutions  to  constrained  op¬ 
timization  problems  in  a  (batch)  linear  quadratic  (LQ)  setting. 


With  (Soft)  Constraint  on  the  Magnitude  of  Control 

mm.J  =  mm.i(j|£<*+>)||^  +  (30) 

where  Q  is  a  weighting  matrix  on  the  learning  error  and  R  is 
the  control  penalty  weighting  matrix.  Q  and  R  should  be  (at 

**ThU  connection  points  to  some  ro6u«t  techniouet  in  the  image 
and  signal  processing  literature  (10,  llj.  Also,  in  \z]  a  robust  tech- 
nique  ts  provided. 

**Hard  constraints  may  also  be  formulated  using  linear  or 
quadratic  programming  (LP  or  QP)  type  techniques. 


least)  non-negative  definite  and  (M'QM  +  R)  should  be  positive 
definite.  The  inversion  based  solution  to  this  problem  is 

«/•>  =  -(Af'QAf  +  fi)->[A/'<5f<0  +  K/(.))  (31) 

/<•+«)  =  /<•)+«/(.)  (32) 

Similarities  between  this  constrained  problem  and  the  re,«. 
UTizttion  solution  found  in  (iterative)  image  deblurring  and 
ratoration  problems  (7)  are  worthnoting.  In  that  context,  the 
image  is  blurred  by  a  transfer  function  and  is  also  corrupted 
by  noise.  To  undo  the  blurring  involves  convolving  the  noise- 
corrupt  blurred  image  with  the  inverse  of  the  blurring  trans¬ 
fer  function.  Since  the  inverse  of  the  blurring  transfer  function 
tends  to  amplify  the  noise,  the  problem  becomes  ill-conditioned. 
Ih  [7],  the  advantage  of  “soft"  inversion  over  exact  inversion  in 
im^e  restoration  is  analyzed  and  demonstrated,  in  terms  of 
noise  conditioning. 

F\irthcrmore,  rtgviarizaiian  techniques  are  used  to  add  con¬ 
straint  on  the  deblurred  image  -  usually  smoothness  constraints 
of  some  kind,  e.g.,  the  R  matrix  could  represent  a  Laplacian 
type  operator  to  penalize  nonsmooth  components  [7],  The  case 
here  U  ilighUy  different  in  that  it  is  to  limit  the  magnitude  of 
the  solution,  / .  This  particular  “regularization**  may  prevent 
cither  violation  of  physical  constrainU  or  numerical  difficulty 
when  M  is  iU-conditionedor  when  the  process  is  of  non-minimum 
phase.  On  the  other  hand,  the  inclusion  of  rate  constraints  on 
the  learned  /  can  be  motivated  by  the  concerns  over  the  cor¬ 
rupting  noise  in  5  as  well  as  over  the  actuator  rate  Umitations. 

With  (Soft)  Constraints  on  the  Magnitude  and  the  Rate 
of  Control 

minJ  =  ^  /(*  +  i) 

+A(/]('+0'fl,^(y)(.  +  i))  (33J 

where  and  are  the  weighting  matrices  on  the  magnitude 
^d  the  “rate"  of  control,  respectively,  and  t^[/]  is  the  control 
increment  vector  defined  as 

=  /(«+!)-/(«)  (34) 

A[/]  =  [  /(O)  A/(l)  .  .  .  A/(Af-i)  ]^(3S) 

Notice  that 


/ 


TAf/l 


(36) 


The  inversion  based  solution  to  this  problem  is: 

^/(•>  =  TA(5/]^‘>  (39) 

/<i+i)  ^  ^  5^0  (40) 

where  Kf  =  MT  and  R  =  T'HjT  +  By  using  appropriate 
R^t  not  only  can  one  constrain  the  rate  of  control  for  physical 
reasons,  but  also  the  undesired  effect  due  to  the  corrupting  noise. 

Soft-Constrained  Optimal  Gradient  Algorithms 
With  constraint  on  the  magnitude  of  control: 

+  (41) 

g  =  +  H/(‘>  (42) 

M  =  (s'gmyM'lifg  +  R)  (43) 


With  constraints  on  both  the  magnitude  and  the  rate  of  control: 


A[/](*+‘)  =  (44) 

„{i+i)  _  (45) 

3  =  (46) 

M  =  m 


6  The  Kalman  Filtering  Approach 

In  this  section,  we  briefly  show  that  the  LC  problem  can  also  be 
formulated  as  a  Kalman  flltering  problem.  Consider  the  *'per- 
feet”  feedforward  signal,  /*  as  the  state  vector,  then  the  follow¬ 
ing  state-space  description  holds: 

yKi+l)  =  (48) 

c*<*>  =  (49) 

where  is  the  (post  i-th  repeat)  optimal  tracking  error,  t/*) 
is  the  additive  noise  of  the  process  and  is  a  random  term 
to  model  any  uncertainty  or  variations  associated  with  /*.  C  r: 
(/  +  GpGe)“*  and  Mr  is  a  composite  transfer  function  that 
relates  /  and  r  to  e: 

e  =  Mr/  (50) 

=  M/  +  5r  (51) 

where  i  is  the  "clean”  tracking  error  (i.e.,  if  noise  does  not  exist), 
hr  is  the  portion  of  e  that  is  due  to  r.  With  this  decomposition, 
it  follows  that: 

e*-e  =  Mr(/*-/)  (52) 

=  M(/*  -  /)  (53) 

c-M/  =  -M/*-Ev  (54) 

where  we  have  used  the  fact  that  0  =  Mr/*  and  e*  =  —  Eu. 
This  leads  to  a  ntw  state-speu:e  description  which  facilitates  a 


Kklinan  filter  solution: 

/•(<+») 

=  /KO  4. 

(55) 

1 

il 

(56) 

=  -Af /•<*)  - 

(57) 

Then,  the  Kalman  filter  takes  on  the  following  form: 

y.(.+i)  _ 

/•(<)  +  k(*>(z(*>  -  i<‘>) 

(58) 

i<')  = 

(59) 

where  is  the  Kalman  gain.  Since 

=  c(‘>-M/(*>  (60) 

/(•)  =  /•<•)  (61) 

the  Kalman  filter  equation  is  simply: 

y.(i4l)  _  /•(.)  + (62) 

Note  that  this  is  noi  the  standard  Kalman  filter  as  the  evolution 
is  not  a  function  of  time  but  rather  a  function  of  the  learning 
cycle.  Noting  that  the  Kalman  filter  in  essence  implements  a 
form  of  "optimal”  system  inversion,  this  Kalman  filter  connec¬ 
tion  does  provide  a  common  ground  for  unifying  the  various 
procedures. 

7  Summary 

With  aU  the  required  process  knowledge  readily  measurable,  the 
proposed  procedures  are  relatively  simple  with  the  ease  of  on¬ 
site  tuning t  and  are  S4>plicable  to  SISO/MIMO,  with  or  without 
PB.  Not  only  are  encouraging  simulation  results  obtained,  but 
we  have  also  obtained  very  encoura^ng  preliminary  expen'men- 
Ul  results  conducted  on  a  semiconductor  wafer  Rapid  Thermal 
Processing  (RIP)  reactor  [12].  Moreover,  the  basic  scheme  has 
also  been  extended  to  non/tnear  processes  with  good  r^ults  [13]. 
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ABSTRACT 


Some  recent  results  of  using  (repetitive)  Learning  Control  (LC)  to  determine  the  appropriate 
feedforward  (FF)  control  action  are  reviewed  in  this  paper.  This  LC  approach  only  requires 
information  that  is  readily  measureable  on-site,  and  is  designed  for  the  ease  of  tuning,  as  it  can 
Uam  from  the  past  experiences.  It  is  applicable  to  a  wide  range  of  single  or  multi- variable,  linear 
or  smoothly  nonlinear  rrpeiitivc  processes.  Many  manufacturing  tasks  are  repetitive  task-oriented 
and  are  potential  applications  of  this  LC  procedure.  Applications  of  these  FF  LC  methods  to 
RTF  temperature  control  arc  the  focus  of  this  paper.  Both  simulation  and  experimental  results 
are  included. 

Although  the  experiment  is  exploratory  in  nature,  the  results  are  very  encouraging  and  deserve 
serious  considerations.  Applying  this  LC  approach  has  resulted  in  a  speed-up  of  a  well-tuned 
FB  loop  by  a  factor  of  8  which  amounts  to  more  than  20  seconds  saving  in  one  processing  step 
-  quite  significant  for  RTF.  Additionally,  the  experiment  has  demonstrated  the  applicability  of 
the  LC  theory  in  a  real-world  manufacturing  setting. 


1  Introduction 

For  Rapid  Thermal  Processing  (RTF)  of  semiconductor  wafers, 
the  ability  to  quickly  manipulate  wafer  temperature  according 
to  the  commanded  temperattire  profile  is  crudal.  Sensor-based 
feedback  (FB)  control  can  certainly  improve  the  RTF  reactor’s 
temperature  following  capability,  maintain  tight  temperature  con¬ 
trol  at  steady  state,  and  reduce  the  effects  due  to  equipment  vari¬ 
ations.  However,  the  speed  of  FB  control  must  be  balanced  with 
stability  considerations  which  arc  often  limited  by  the  process 
characteristics  such  as  time  delay. 

Feedforward  (FF)  control,  on  the  other  hand,  can  complement 
the  FB  control  performance  by  promoting  non-delay  and  an¬ 
ticipatory  actions  which,  when  properly  designed,  can  lead  to 
superior  tracking  or  disturbance  rejection.  Combining  FB  and 
FF,  one  could  have  a  robust,  stable  and  yet  agile  temperature 
control  system, 

Traditional  FF  design  is  usually  based  on  analytical  methods 
that  require  fairly  accurate  modeling  of  the  process  and  the  FB 
control  loop.  Such  knowledge  is  often  not  available  or  is  subject 
to  change  overtime.  Furthermore,  the  so  designed  FF  control  is 
often  difficult  to  tunc. 

The  LC  approach  described  in  this  paper  and  in  [1,  2,  3],  on 
the  other  hand,  only  requires  the  FB  loop  characteristics  that 
are  readily  measureable  on-site,  and  is  designed  for  the  ease  of 
tuning,  as  it  can  Icam  from  the  past  experiences.  Many  man¬ 
ufacturing  tasks  are  repetitive  task-oriented  and  are  potential 
applications  of  LC.  This  LC  approach  is  applicable  to  a  wide 
range  of  processes,  single  or  multi- variable,  linear  or  smoothly 
nonlinear. 

In  this  paper,  we  review  these  very  recent  results  including  ex¬ 
perimental  results  of  applying  the  basic  LC  scheme  to  an  RTF 
reactor  designed  for  semiconductor  wafer  manufacturing.  A  fac¬ 
tor  of  b  speed-up,  or  more  than  20  seconds  saving  in  one  sing;^ 
processing  step,  is  realized  by  using  FF  LC. 

'Work  supported  in  part  by  .\RPA  under  AFOSR  Contract  No.  F49620- 
94-C-0003. 


The  rest  of  the  paper  is  organized  as  follows.  First,  the  basic  LC 
scheme  and  its  variants  are  reviewed  and  summarized  in  Section 
2.  In  Section  3,  the  RTF  experiment  is  described  and  shown 
with  its  100®  C  temperature  step  setpoint  following  capability. 
With  a  tuned  FB  only  control,  it  takes  about  25  seconds  or  so 
to  reach  the  new  tempearature  target  at  the  high-end  of  the 
temperature  range.  We  show  that,  with  the  application  of  FF 
LC,  this  to  target  time  is  reduced  to  less  than  3  seconds  -  a 
factor  of  8  speed-up  or  a  saving  of  more  than  20  seconds  for  this 
one  processing  step  which  is  quite  significant  for  RTF! 

Section  4  further  explains  how  this  basic  LC  scheme  can  be  ex¬ 
tended  to  a  class  of  smoothly  nonlinear  processes,  and  demon¬ 
strates  rapid  convergence  for  a  simxilated  nonlincr  RTF  process. 
Methods  for  implementing  FF  LC  in  a  practical  system  arc  dis¬ 
cussed  in  Section  5. 

2  The  Basic  Learning  Algorithm  and 
Vciriants 

Consider  a  combined  FB  and  FF  control  configuration  as  shown 
in  Figure  1.  The  basic  LC  scheme  and  its  variants  be  de¬ 
scribed  by: 

/<•+»)  =  /(•■)  _  XeW  (1) 

where  e<‘)  is  the  tracking  error  vector  *  and  /(•)  the  FF  control 
signal  in  the  i-th  iteration  and  /C  is  a  general  (linear)  operator 
which  facilitates  LC.  This  LC  operator,  Kj  may  include  Ume- 
advance  operations,  since  the  entire  is  known  after  the  i-th 
iteration. 

Realizing  that  both  /  and  the  command  r  contribute  to  e,  a 
straightforward  analysis  yields  the  following  relationships  in  noise¬ 
less  conditions: 

=  (1  -  -  KTt  (2) 

=  {I-MK)e^'^  (3) 

where  M  is  the  transfer  operator  from  /  to  c,  and  F  is  the 
transfer  operator  from  the  command  r  to  c. 


(13) 

(U) 

(15) 


Notice  that  if  K  it  chosen  as  Itic  mverse  of  ,  i  c  ,  the  Irani 
fer  operator  from  /  to  c,  then  exact  convergence  with  one  task 
repeat  is  possible,  and 


Figure  1:  Combined  Feedforward  and  Feedback  Configuration 

/<•+')  =  (5) 

=  p-'r  (6) 

=  0  (7) 

where  /  ==  P'V  is  the  “perfect"  FF  control  signal  for  repro¬ 
ducing  the  command  trajectory  r.  (‘P"'  assumes  the  existence 
of  process  “right  inverse"  (4].  A  nectssary  condition  is  that  the 
number  of  process  inputs  must  be  no  less  than  the  number  of  out¬ 
puts.  With  the  existence  of  the  existence  of  Ad"'  for  the 
closed-loop  is  assured.  When  the  right  inverse  docs  not  exist, 
least-squares  solutions  can  be  sought  according  to  a  computa¬ 
tional  formulation  described  later.) 

A  more  relaxed  condition  for  convergence  is  that 

|\(/  -  MK)\  <  1  (8) 


0<K{MK)<2  (9) 

where  Xi  denotes  the  eigenvalues.  This  relaxed  condition  allows 
the  use  of  a  broader  scope  of  algorithms  [1]  ranging  from  eases 
where  Ad"'  is  not  known  exactly  (c.g.,  due  to  measurement  im¬ 
precision  or  process  nonlinearity)  to  gradient  descent  type  of  op¬ 
timization  algorithms.  Now,  instead  of  converging  in  one  repeat, 
it  will  take  multiple  repeats  to  achieve  satisfactory  results. 

Computationally,  this  basic  algorithm  involves  the  following  (as¬ 
suming  the  closed-loop  FB  system  is  Itncor  timc-mrariantV 

«(2) 

c(3) 

e(4) 

-e  =  -  (10) 


[e(N^l)  J 

0  0 
0 

hz  hz  hi 
hs  hs~\  V-j 


=  Mdf  (12) 

‘Under  certain  condition*,  the  FB  controller  output  u/>  may  be  used 
inticad  of  the  tracking  error  e.  Actually,  it  i*  the  de*i«lion  of  from 
iU  initial  tteady  state  that  should  be  used  for  iterative  learning.  Thu  is 
because  if  the  FF  is  doing  a  perfect  job  for  the  transient,  then  the  FB 
controller  would  not  have  to  labor. 


= 

where  /i,,  /ij,  .,.hs  is  the  closed-loop  impulse  response  sequence 
from  /toe.  Its  state-space  equivalent  would  be  CB,  C7  AB,  CA'B, 

...,CA^"'B  where  [A,B,C1  stands  for  a  state-space  represen¬ 
tation  of  the  /  to  e  process.  '  Matrix  A/  is  a  time-domain 
realization  of  the  transfer  operator,  M.  ' 

Using  the  impulse  response  representation  directly  facilitates 
on-site  measurement  and  simpler  computation.  It  also  allows 
greater  flexibility  in  representing  the  process  dynamics  than  a 
parametric  model. 

When  the  process  it  not  right  invertible,  the  M  matrix  would  not 
be  invertible.  In  that  case,  a  least-squares  (LS)  solution  can  still 
be  sought.  Constraints  can  be  further  used  in  the  LS  formulation 
to  help  shape  the  FF  solution.  For  instance,  by  adding  penalty 
on  the  magnitude  of  FF  control,  one  can  discourage  excessively 
large  control  which  may  result  from  a  non- minimum  phaise  pro¬ 
cess  or  an  excessively  demanding  transient  command.  Likewise, 

the  rate  of  the  learned  FF  control  can  also  be  constrained  to 
avoid  actuator  rate  saturation  and  minimize  the  potential  effect 
of  noise  [1].  Extension  to  include  hard  constraints  can  be  made 
using  linear  programming  (LP)  or  quadratic  programming  (QP) 
type  of  formulations. 

This  procedure  is  validated  using  a  simulated  3x3  multi-input 
multi-output  (MIMO)  process  stabilized  with  decentralized  PI 
controllers.  Figure  2  illustrates  the  system  performance  in  track¬ 
ing  a  ramp  command.  The  top  plots  show  the  tracking  per¬ 
formance  with  feedback  control  only;  the  bottom  plots  show 
the  performance  after  one  repeat  of  the  task.  The  impulse  re¬ 
sponse  matrix  is  estimated  by  differencing  the  “measured*  step 
responses.  The  simulation  is  done  using  the  graphics  oriented 
control  design  software,  SystemBuild  and  MATRIXx.  The  solid 
lines  indicate  the  command  trajectories  and  the  dashed  lines  the 
actual  process  responses.  It  is  apparent  that  in  this  idealized  liiv- 
ear  time-invariant  situation,  the  basic  LC  approach  can  achieve 
perfect  tracking  in  one  learning  repeat.  Good  performance  it 
also  obtainable  in  a  noisy  environment  by  taking  proper  noise 
reduction  measures  discussed  in  [1]. 

A  question  in  order  is:  would  this  LC  approach  work  well  in  a 
rugged  industrial  manufacturing  environment?  A  major  portion 
of  this  paper  is  devoted  to  reporting  some  preliminary  experi¬ 
mental  results  of  applying  LC  to  wafer  temperature  tracking  in 
an  RTF  reactor. 

3  The  RTF  Reactor  and  Wafer  Tem¬ 
perature  Control 

The  reactor,  on  which  the  experiments  were  conducted,  uses 
rapid  thermal  processing  (RTF)  technology  to  process  one  wafer 
at  a  time.  According  to  a  pre-designed  temperature  command 
profile,  the  wafer  is  raised  to  high  temperatures  in  a  few  stages 
stages  during  the  course  of  the  process  and  then  cooled  down.  It 
is  crucial  to  maintain  tight  temperature  control  at  the  set  tem¬ 
perature,  but  it  is  also  very  important  to  quickly  respond  to  the 
temperature  command  vhthout  overshoot  The  FB  controller  is 
designed  with  an  anti-windup  PI  and  a  lead/lag  compensator  in 
the  forward  path.  The  lead/lag  is  introduced  to  offset  some  of 

Ufi  is  used  for  learning,  then  the  impulse  response  sequence  should 
be  from  /  to  u/». 

^Equations  (10)  and  (11)  have  asaumed  that  the  only  time  delay  is  the 
sample  delay,  i.e.,  1  delay.  If  additional  process  delays  exist,  some  of  the 
leading  >i's  would  be  sero  and  should  be  removed  from  the  equations,  to¬ 
gether  with  the  corresponding  leading  e's. 


the  phise  lag  due  to  the  thermal  capacitaoce  induced  procesi 
delay  (Procejs  delays  of  about  I  25  seconds  to  1.5  seconds  are 
observed.)  The  FB  controller  is  also  gain-scheduled  throughout 


T 

A 

I 


4 


: 

[J 

_ 1 _ 

— 

: 

V  . 

— 

. 

:  1 

• 

B 

B 

B  1 

X  1 

aS  1 

.1  « 

Figure  2;  Simulated  Feedforward  LC  (MIMO  case) 


the  whole  operational  range  to  maximiie  the  performance  uni- 
the  high  end  of  the  temperature  range,  it  is 
more  difficult  to  achieve  high  speed  and  stability  simultaneously 
and  sUbihty  U  given  higher  priority. 


Figure  3  shows  a  "high-end*  wafer  temperature  response  to  a 
st^  command  from  800"  C  to  900*  C.  The  settUng  time  is  some- 
w^  around  25  seconds  -  sluggish  compared  to  the  "low-end* 
pmormances.  The  following  discussions  show  how  we  use  this 
info^tion  to  design  a  FF  control  signal  that  drastically  speeds 
Up  the  temperature  rc*pon»e. 


To  cany  out  the  FF  learning  procedure,  one  needs  to  measure 
the  mpulse  response  characterising  the  transfer  function  from 
the  FF  signal  to  the  temperature  tracking  enor,  e,  or  to  the  FB 
controller  output,  uy».  One  way  to  take  such  measurements  is 
to  mject  FF  steps  and  measure  the  associated  responses.  That 
IS,  special  system  identification  experiments  would  have  to  be 
conducted. 


For  the  single-input  single-output  linear  case,  however,  the  trans¬ 
fer  function  from  the  command  to  wafer  temperature  is  equiv¬ 
alent  to  the  transfer  from  the  FF  signal  to  the  FB  controller 


Figure  3:  W^er  Temperature  Response  to  Temperature  Step 
Command  (FB-Only  Control)  ^ 


output  except  for  a  difference  in  sign.  Therefore,  we  use  the  tem¬ 
perature  step  response  obtained  above  to  derive  the  FF  to  FB  im- 
pube  response.  The  step  response,  however,  is  contaminated  by 
the  square- wave  type  measurement  disturbances.  These  square- 
wave  disturbances  are  inevitably  introduced  by  the  cxistine  sen¬ 
sor. 


T^e  disturbances  must  be  removed  in  order  to  obtain  a  meao- 
in^l  impulse  response  reading.  This  noise  removal  can  be  ear 
ned  out  by  using  (non-causaf)  low.pass  filtering  or  by  Sttin. 
a  low-order  system  to  the  step  response.  The  particular  non 
caus^  fUtenng  we  used  is  a  form  of  jack-knife  suisampling  (61 
*nd  (cubic)  spline  interpolation  scheme.  Jack-knife  sampliL  u 
effective  for  estimating  ensemble  average  without  actually  con- 
ducting  multiple  experiments. 


Next,  the  filtered  step  response  is  time  differenced  and  sign  re¬ 
versed  to  produce  an  estimate  of  the  FF  to  FB  impulse  response 
Due  to  time  differencing,  the  “tail*  of  the  impulse  response  still 
exhibits  some  oscillations  from  the  residual  noise.  To  improve 
the  steady-state  performance  of  the  FF  solution,  the  tail  i,  fur- 
ther  smoothed  out.  *  The  rcault  it  shown  in  Figure  4. 


Another  needed  ingredient  is  the  "error*  signal  which  is  the  dif- 
ference  between  the  desired  and  the  actual  What  is  the  desired? 
Since  the  step  command  is  still  driving  the  FB  loop,  and  for 
practical  reasons  we  certainly  do  not  want  the  temperature  to 


Figure  4;  The  Estimated  Impulse  Response  Model 


*».  j  ■  7 - “  introduced  to  generate 

the  desire  response.  A  fint-order  re/erence  mode/ with  rise  time 
of  about  2  seconds  is  used  to  provide  the  desired  tempenrtvre  re- 
jmse.  Howler,  since  we  use  the  FB  output  a,  an  indication 
of  tow  weu  the  system  performs,  a  desired  FB  signal  must  be 
^puted.  This  derired  FB  signal  is  generated  b^feeding  the 
difference  between  the  step  command  and  the  desbed  tempera¬ 
ture  respoi^  to  a  copy  of  the  FB  controUcr.  This  FB  controller’s 
output  is  then  the  desired  FB.  oomrouer  s 

With  the  desired  and  the  actual  FB,  a  FB  output  based  error 
signal  IS  obt^ned.  (This  error  signal  i,  time-advanced  accord¬ 
ing  to  Eq.  (II)  and  the  comment  thereafter  to  account  for  the 
process  and  sample  delay.)  This  time-advanced  error  signal  «d 
the  impulse  response  vector  were  then  used  by  the  LC  algorithm 
to  compute  a  FF  signal.  The  algorithm  computes  an  appropri- 
ate  FF  tigml  that  would  mimmizc  tbit  error  according  to  the 
impube  response  model.  For  unconstrained  situation,  this  min- 
unisation  can  be  accomplished  by  exact  inversion.  However  it 
IS  known  that  exact  inversion  is  often  not  a  good  idea  in  Ihc 
face  of  uncertainties,  a  principle  also  found  in  iterative  image 
deblumng  (7]  and  adaptive  signal  processing  (8).  We  instead  use 
Mft  inversion  by  numerically  iterating  with  Optimal  Gramea: 
Descent  [5,  1]. 


Thirty  (30)  numerical  iterations  are  performed  between  process 
repeats.  The  resultant  FF  signal  is  then  injected  with  proper 
time  synchronization.  The  result  is  a  significantly  faster  re- 
s^nse.  The  (to  90%)  rise  time  is  about  three  (3)  limes  shorter 
than  that  of  FB  only  control  and  vet  urithoui  overshoot  H'-w. 
ever,  it  is  noticed  that  the  rise  time  is  itiU  longer  than  the  desired, 
and  the  wafer  temperature  starts  the  transition  a  bit  too  early 
(about  0.8  second  or  so).  A  few  factors  could  have  contributed 
to  this,  including  measurement  and  numerical  errors  and  perhaps 
procesi  nonlinearity  that  would  affect  the  fidelity  of  impulse  re- 


prooiem  eould  t>«  avoided,  if  the  impiUe 
response  is  generated  by  a  pMometric  model  fitted  to  the  data. 
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sponsc  (linear)  modeling 

Using  the  same  FF  signal  but  delaying  it  by  0  5  second,  a  very 
good  result  is  obtained  and  is  displayed  in  Figure  5.  Notice 
the  calculated  anticipatory  action  of  the  FF  signal  that  precedes 
the  step  command.  The  wafer  temperature  starts  the  upward 
transition  just  on  time  and  hits  the  target  in  less  than  3  seconds 
with  no  overshoot.  Comparing  to  the  25  seconds  or  so  settling 
time  with  FB  only  control,  this  means  a  saving  of  more  than  20 
seconds  for  just  one  processing  step  which  is  quite  significant  for 

RTF. 

Finally,  “robustness”  of  the  combined  FF  and  FB  scheme  is  re¬ 
vealed  in  the  following  two  cases.  If  the  (first)  FF  is  delayed 
by  1  second  (i.e.,  tunce  as  muck  as  the  0.5  second  delay),  the 
wafer  temperature  is  still  well  behaved  and  responds  fast,  but 
with  about  7*  C  overshoot  (for  this  100®  step).  This  provides  a 
*fccl*  for  the  robustness  of  FF  control.  Another  robustness  in¬ 
dicator  is  the  reptatahUity  run  which  was  performed  three  days 
after  the  first  run  was  made  and  showed  remarkable  consistency. 
In  between,  the  reaction  chamber  was  taken  apart  once  for  ex¬ 
amination  and  that  apparently  did  not  impact  the  performance. 
The  relatively  tight  closed-loop  FB  control  should  be  (at  least 
partially)  credited  for  the  apparent  robustness.  This  is  one  of 
the  reasons  for  using  the  combined  FF  and  FB  configuration. 
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Figure  5:  Wafer  Temperature  with  FF  Learning 

4  LC  for  Nonlinear  Processes 

In  this  section,  we  briefly  describe  an  extension  of  the  ba^ic  LC 
scheme  to  a  class  of  smoothly  non/incar  processes.  For  nonlin¬ 
ear  processes,  the  notion  of  “impulse  response"  is  not  exact,  and 
varies  with  the  process  state  and  input.  Using  the  “impulse  re¬ 
sponse*  measured  at  a  certain  process  condition  would  not,  in 
general,  adequately  describe  the  process  behavior  throughout  a 
(temperature)  control  task.  The  model  used  for  this  study  is  a 
nonlinear  RTF  model  which  is  fitted  to  some  measured  data  to 
reflect  nonlinear  process  gain  and  time  constant.  Direct  appli¬ 
cation  of  the  basic  LC  scheme  (with  full  strength)  has  resulted 
*'  'earning  oscillations  and  ar^'arrnt  divergence.  Although  with 
reduced  strength^  the  basic  learning  scheme  would  still  converge, 
but  only  after  many  iterations. 

Instead,  using  a  prediction  error  based  scheme  described  in  [3], 
the  “impulse  response”  model  can  be  adapted  during  the  iterative 
learning  process,  resulting  in  rapid  convergence.  Figure  6  shows 
such  an  adaptive  learning  sequence  that  converges  to  the  desired 
reference  response  in  2  or  3  adaptations.  (The  feedback  only 
response  which  is  not  shown  is  an  order  of  magnitude  slower.) 


Figure  6:  An  Adaptive  LC  Sequence  -  for  a  nonlinear  RTF  pro¬ 
cess 


5  Implementing  LC 


The  LC  scheme  discussed  so  far  is  sigiud  synthesis  based,  i.e.,  an 
appropriate  FF  signal,  /,  is  determined  for  a  specific  task  and 
redpe.  A  FF  data  base  can  be  constructed  for  a  set  of  recipes. 
When  faced  with  a  nrto  recipe,  however,  one  would  have  to  either 
interpolate  from  the  FF  data  base,  or  likely  relearn  the  FF  signal. 
Then,  the  FF  data  base  needs  to  be  updated  accordingly. 

A  more  effective  method  for  implementing  LC  in  a  practical  sys¬ 
tem  is  to  use  a  dynamic  FF  filter  which  is  driven  by  the  recipe 
command  to  generate  the  appropriate  FF  signal.  In  this  way, 

the  need  for  a  (growing)  FF  data  base  and  for  relearning  is  elim¬ 
inated.  Based  on  the  signal  synthesis  based  LC,  methods  for 
realizing  a  dynamic  filter  based  implementation  have  been  de¬ 
veloped  and  are  described  in  [9]. 

6  Summary 


The  feedforward  Learning  Control  (LC)  methodology  is  shown 
to  be  applicable  to  the  control  of  repetitive  industrial  and  mau- 
facturing  processes  in  general  and  of  the  wafer  temperature  in  a 
typical  RTF  reactor,  in  particular.  The  preliminary  experiments 
have  demonstrated  a  speed-up  by  a  factor  of  8,  or  20  seconds 
saving  in  a  single  processing  step,  quite  significant  for  RTF. 

With  all  the  required  ingredients  readily  measurable,  the  learn¬ 
ing  scheme,  as  decribed  here  and  in  [1],  is  suitable  for  on-site 
tuning  and  learning.  The  very  encouraging  experimental  results 
seem  to  strongly  support  the  applicability  c*  the  theory  and  de¬ 
serve  serious  considerations  for  practical  deployment. 
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ABSTRACT 

Hrrr'nt.  algoril Imiic  results  in  (feedforward)  Learning 
( -ontrol  (LCJ  [1)  are  applied  to  the  control  ofsemicon- 
tlurtor  wafer  temperature  in  a  Rapid  Thermal  Pro- 
H'ssing  (RTP)  reactor.  Although  the  first  attempt  is 
experimental  in  nature,  the  results  are  very  encourag¬ 
ing  and  (k's^rve  serious  consideration. 

Applying  this  LC  approach  has  resulted  in  a  speed-up 
of  a  well-tuned  FB  loop  by  a  factor  of  8  which  amounts 
to  more  than  ‘20  seconds  saving  in  one  processing  step 
quite  significant  for  RTP.  Additionally,  the  exper¬ 
iment  has  demonstrated  the  applicability  of  the  LC 
theory  in  a  real-world  manufacturing  setting.  Many 
repetitive  manufacturing  tasks  are  potential  applica¬ 
tions  of  this  LC  procedure,  ^ 

1  Introduction 

tor  Rapid  Thermal  Processing  (RTP)  of  semiconduc¬ 
tor  wafers,  the  ability  to  quickly  manipulate  wafer 
temperature  according  to  the  commanded  tempera¬ 
ture  profile  is  crucial.  Sensor-based  feedback  (FB) 
control  can  certainly  improve  the  RTP  reactor’s  tem¬ 
perature  following  capability,  maintain  tight  temper¬ 
ature  control  at  steady  state,  and  reduce  the  effects 
due  to  equipment  variations.  However,  the  speed  of 
FB  control  must  be  balanced  with  stability  considera¬ 
tions,  and  is  often  limited  by  the  process  characteris¬ 
tics  such  as  time  delay. 

Feedforward  (FF)  control,  on  the  other  hand,  can 
complement  the  FB  control  performance  by  promot¬ 
ing  non-delay  and  anticipatory  actions  which,  when 
properly  designed,  can  lead  to  superior  tracking  or 
disturbance  rejection.  Combining  FB  and  FF,  one 
could  have  a  robust,  stable  and  yet  agile  temperature 
control  system. 

•Work  supported  by  ARPA  under  AFOSR  Contract  No. 
F49620-94-C-0003. 
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Traditional  FF  design  is  usually  based  on  analytical 
methods  that  require  fairly  accurate  modeling  of  the 
process  and  the  FB  control  loop.  Such  knowledge  is 
often  not  available  or  is  subject  to  change  overtime. 
Furthermore,  the  so  designed  FF  control  is  often  diffi¬ 
cult  to  tunc. 

The  LC  approach  described  in  [1],  on  the  other  hand, 
only  requires  the  FB  loop  characteristics  that  are  read¬ 
ily  measureable  on-site,  and  is  designed  for  the  ease 
of  tuning,  as  it  can  /cam  from  the  past  experiences 
via  task  repetition.  (Many  manufacturing  tasks  are 
repetitive  task-oriented  and  are  potential  applications 
of  LC.)  This  LC  approach  is  therefore  applicable  to  a 
wide  range  of  processes,  single  or  multivariable,  linear 
or  smoothly  nonlinear.  ^ 

In  this  paper,  we  report  our  experimental  results  of 
applying  the  basic  LC  scheme  to  an  RTP  reactor  de¬ 
signed  for  semiconductor  wafer  manufacturing.  A  fac¬ 
tor  of  8  speed-up,  or  more  than  20  seconds  saving  in 
one  processing  step,  is  realized  by  using  learning  FF 
control. 

The  rest  of  the  paper  is  organized  as  follows.  First, 
the  basic  LC  scheme  and  its  variants  are  reviewed 
and  summarized  in  Section  2.  In  Section  3,  the  RTP 
process  is  described  and  shown  with  its  100®  C  tem¬ 
perature  step  following  capability.  With  a  tuned  FB 
only  control,  it  takes  about  25  seconds  or  so  to  reach 
the  new  tempearature  target.  We  show,  in  Section  4. 
that  with  the  application  of  FF  LC,  this  to  target  time 
is  reduced  to  less  than  3  seconds  -  a  factor  of  8  speed¬ 
up  or  a  saving  of  more  than  20  seconds  for  this  one 
processing  step  which  is  quite  significant  for  RTP!  The 
details  of  conducting  this  FF  LC  experiment  are  also 
included  in  Section  4.  The  robustness  of  the  results  is 
confirmed  by  a  repeat] bility  test. 


^This  simple  and  effective  LC  scheme  has  also  been  extended 
to  the  control  of  a  class  of  nonlinear  processes.  Details  of  that 
extension  arc  described  in  a  separate  paper  (2). 
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where  /  =  is  the  “perfect”  FF  control  signal  for 

reproducing  the  command  trajectory  r.  assumes 

the  existence  of  process  “right  inverse”  [3,  1].  A  nec¬ 
essary  condition  is  that  the  number  of  process  inputs 
must  be  no  less  than  the  number  of  outputs.  With  the 
existence  of  the  existence  of  for  the  closed- 
loop  is  assured.  When  the  right  inverse  does  not  exist, 
least-squares  solutions  can  be  sought  according  to  the 
following  computational  formulation.) 

A  more  relaxed  condition  for  convergence  is  that 


j^igure  1:  Combined  Feedforward  and  Feedback  Con- 

r 


guration 

The  Basic  Learning  Algo¬ 
rithm  and  Variants 


■Consider  a  combined  FB  and  FF  control  configuration 
las  shown  in  Figure  1.  The  basic  LC  scheme  and  its 
variants  can  be  described  by: 

^i+i)  -  f{i)  _  (1) 

where  is  the  tracking  error  vector  ^  and  the  FF 
signal  in  the  x-th  iteration  and  is  a  general  (linear) 
operator  which  facilitates  LC.  This  LC  operator,  K, 
may  include  time-advance  operations  to  allow  proper 
credit  assignment  for  the  learning  process. 

Realizing  that  both  /  and  the  command  r  contribute 
to  c,  straightforward  analysis  yields  the  following  re¬ 
lationships  in  noiseless  conditions: 

/(<+!)  =  (J  _  KM)f^''>  -  KTr  (2) 

,,(.+1)  _  (l-MK)e^^  (3) 

where  M  is  the  transfer  operator  from  /  to  e,  and  F 
is  the  transfer  operator  from  the  command  r  to  e: 

M  =  -{I  +  TC)-^V 

r  =  I-{I  +  VC)-^TC 


(4) 

(5) 


Notice  that  if  if  is  chosen  as  the  inverse  of  M,  i.e.,  the 
transfer  operator  from  /  to  c,  then  exact  convergence 

urtt  Vi 


or 


\\i{I-MK)\<  1 


0  <  Xi{MK)  <  2 


(9) 


(10) 


where  A,*  denotes  the  eigenvalues,  and  matrix  M  is 
a  time-domain  (Toeplitz)  realization  of  the  transfer 
operator,  Ad,  as  shown  below  in  Eq.  (12).  This 
relaxed  condition  allows  the  use  of  a  broader  scope 
of  algorithms  ramging  from  cases  where  is  not 

known  exactly  (e.g.,  due  to  measurement  imprecision 
or  process  nonlinearity)  to  gradient  descent  type  of 
optimization  algorithms.  Now,  instead  of  converging 
in  one  repeat,  it  will  take  multiple  repeats  to  achieve 
satisfactory  results. 

Computationally,  this  basic  algorithm  involves  the  fol¬ 
lowing  (assuming  the  closed-loop  FB  system  is  linear 
time- invariant): 


c(2) 

c(3) 

c(4) 
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/<•■+!)  =  {I  -  -M'^Tr 

(6) 

=  V-^r 

(7) 

f(i+l)  _  j{i)  ^  ijii) 

(13) 

e(*+‘l  =  0 

(8) 

=  /<■)  - 

(14) 

^Undcr  certain  conditions,  the  signal,  instead  of  the 

tracking  error,  c,  may  be  used  for  iterative  learning,  where  uji, 
is  the  deviation  of  the  FB  controller  output  from  a  reference 
u/6. 


where  /ii ,  h2i  ha,  ...hjv  is  the  measureable  closed-loop 
impulse  response  sequence  from  /  to  e.  Its  state-space 
equivalent  would  be  CB, CAB, 
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wlicrc  (A,n,C)  slands  for  a  state-space  representation 
of  tlic  /  to  c  process.  ^  ^ 

When  tiic  process  is  not  right  invertible,  the  M  matrix 
would  not  be  invertible.  In  that  case,  a  least-squares 
(LS)  solution  can  still  be  sought.  Constraints  can 
be  further  used  in  the  LS  formulation  to  help  shape 
the  FF  solution.  For  instance,  by  adding  penalty  on 
the  magnitude  of  FF  control,  one  can  discourage  suc¬ 
cessively  large  control  which  may  result  from  a  non¬ 
minimum  phase  process  or  an  overly  demanding  tran¬ 
sient  command.  Likewise,  the  rate  of  the  leairned  FF 
control  can  also  be  constrained  to  avoid  actuator  rate 
saturation  and  minimize  the  potential  effect  of  noise. 
Extension  to  include  hard  constraints  can  be  made 
using  linear  programming  (LP)  or  quadratic  program¬ 
ming  (QP)  type  of  formulations. 


Figure  2:  Simulated  Feedforward  LC  (MIMO  case) 


This  procedure  is  validated  using  a  simulated  3x3 
multi-input  multi-output  (MIMO)  process  stabilized 
with  decentralized  PI  controllers.  Figure  2  illustrates 
the  system  performance  in  tracking  a  ramp  command. 
The  top  plots  show  the  tracking  performance  with 
feedback  control  only;  the  bottom  plots  show  the  per¬ 
formance  after  one  repeat  of  the  task.  The  impulse 
response  matrix  is  estimated  by  differencing  the  “mea¬ 
sured”  step  responses.  The  simulation  is  done  using 
the  graphics  oriented  control  design  software,  System- 
Build  and  MATRIXx.  The  solid  lines  indicate  the 
command  trajectories  and  the  dashed  lines  the  actual 
process  responses.  It  is  apparent  that  in  this  ideaJized 
linear  time-invariant  situation,  the  basic  LC  approach 
can  achieve  perfect  tracking  in  one  learning  repeat. 
Good  performance  is  also  obtainable  in  a  noisy  en¬ 
vironment  by  taking  proper  noise  reduction  measures 
discussed  in  [1]. 

A  question  in  order  is:  would  this  LC  approach  work 
well  in  a  rugged  industrial  manufacturing  environ¬ 
ment?  The  rest  of  the  paper  is  devoted  to  reporting 
some  preliminary  experimental  results  of  applying  LC 
to  wafer  temperature  tracking  in  an  RTF  reactor. 

3  The  RTP  Reactor  and  Wafer 
Temperature  Control 

The  reactor,  on  which  the  experiments  were  con- 


^If  ujt,  is  used  for  learning,  then  the  impulse  response  se¬ 
quence  should  be  from  /  to  uj^. 

^Ex^uations  (11)  and  (12)  have  assumed  that  the  only  time 
delay  is  the  sample  delay,  i.c.,  1  delay.  If  additional  process 
delays  exist,  some  of  the  leading  h’s  would  be  zero  and  should 
be  removed  from  the  equations,  together  with  the  corresponding 
leading  e*s. 


ducted,  uses  the  rapid  thermal  processing  (RTP)  tech¬ 
nology  to  process  one  wafer  at  a  time.  According  to  a 
pre-designed  temperature  command  profile,  the  wafer 
is  raised  to  high  temperatures  in  a  few  stages  dur¬ 
ing  the  course  of  the  process  and  then  cooled  down. 
It  is  crucial  to  maintain  tight  temperature  control  at 
the  set  temperature,  but  it  is  also  very  important  to 
quickly  respond  to  the  temperature  command  without 
overshoot  The  FB  controller  is  designed  with  an  anti¬ 
windup  PI  and  a  lead/lag  compensator  in  the  forward 
path.  The  lead/lag  is  introduced  to  offset  some  of 
the  phase  lag  due  to  the  thermal  capacitance  induced 
process  delay.  (Process  delays  of  about  1.25  seconds 
to  1.5  seconds  are  observed.)  The  FB  controller  is 
also  gain-scheduled  throughout  the  whole  operational 
range  to  maximize  the  performance  uniformity.  How¬ 
ever,  at  the  high  end  of  the  temperature  range,  it  is 
more  difficult  to  achieve  high  speed  and  stability  si¬ 
multaneously,  and  stability  is  given  higher  priority. 

Figure  3  shows  a  “high-end”  wafer  temperature  re¬ 
sponse  to  a  step  command  from  800®  C  to  900®  C.  The 
settling  time  is  somewhere  around  25  seconds  -  slug¬ 
gish  compared  to  the  “low-end”  performances.  The 
next  Section  discusses  how  we  use  this  information  to 
design  a  FF  control  signal  that  drastically  speeds  up 
the  temperature  response. 

4  Feedforward  LC  Applied  to 
the  RTP  Reactor 

To  carry  out  the  FF  learning  procedure,  one  needs  to 
measure  the  impulse  response  characterizing  the  trans¬ 
fer  function  from  the  FF  signal  to  the  temperature 
tracking  error,  e,  or  to  the  FB  controller  output,  tiy*. 
One  way  to  take  such  measurements  is  to  inject  FF 
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■Figure  3:  Wafer  Temperature  Response  to  Tempera- 
*ture  Step  Commamd  (FB-Only  Control) 


Kteps  and  measure  the  associated  responses.  That  is, 
"special  system  identification  experiments  would  have 
to  be  conducted. 


1 


For  the  single-input  single-output  linear  case,  however, 
the  transfer  function  from  the  command  to  wafer  tem¬ 
perature  is  also  equivedent  to  the  transfer  from  the 
FF  signal  to  the  FB  controller  output  except  for  a 
difference  in  sign.  Therefore,  we  use  the  tempera¬ 
ture  step  response  obtained  above  to  derive  the  FF  to 
FB  impulse  response.  The  step  response,  however,  is 
contaminated  by  the  square- wave  type  measurement 
disturbances.  These  squ axe- wave  disturbances  are  in¬ 
evitably  introduced  by  the  existing  sensor. 

These  disturbances  must  be  removed  in  order  to  obtain 
a  meaningful  impulse  response  reading.  This  noise  re¬ 
moval  can  be  carried  out  by  using  (non-causal)  low- 
pass  filtering  or  by  fitting  a  low-order  system  to  the 
step  response.  The  particular  filtering  we  used  is  a 
jack-knife  subsampling  and  (cubic)  spline  interpola¬ 
tion  scheme.  The  contaminated  step  response  (sam¬ 
pled  at  a  20  Hz  rate)  is  first  subsampled  every  10  sam¬ 
ples.  Then,  at  each  subsampled  point,  the  tempera¬ 
ture  reading  is  replaced  by  the  average  reading  around 
that  point  (i.e.,  the  average  of  readings  at  the  original 
20  Hz  rate  around  that  point).  Mean  averaging  with  a 
window  of  size  15  is  used.  This  is  a  form  of  jack-knife 
sampling  [5]  which  is  effective  for  estimating  ensemble 
average  without  actually  conducting  multiple  experi¬ 
ments.  These  subsampled  and  averaged  data  points 
are  then  interpolated  back  using  the  (cubic)  spline. 
This  filtering  scheme  performs  very  well.  Figure  4 
compares  the  step  response  before  and  after  smooth¬ 
ing.  (Other  non-causal  filtering  schemes  such  as  tak- 
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Figure  4:  Step  Response  Smoothing 


Figure  5:  The  Estimated  Impulse  Response  Model 

ing  the  average  of  forward  and  backward  Butterworth 
low-pziss  filtered  results  may  also  be  adopted.) 

Next,  the  filtered  step  response  is  time  differenced 
and  sign  reversed  to  produce  an  estimate  of  the  FF 
to  FB  impulse  response.  Due  to  time  differencing, 
the  “tail”  of  the  impulse  response  still  exhibits  some 
oscillations  from  the  residual  noise.  To  improve  the 
steady-state  performance  of  the  FF  solution,  the  tail 
is  further  smoothed  out.  The  resultant  impulse  re¬ 
sponse  model  (with  1.25  second  process  time  delay 
removed)  is  shown  in  Figure  5.  ^ 

Another  needed  ingredient  is  the  “error”  signal  which 
is  the  difference  between  the  dtsired  and  the  actual. 
What  is  the  desired?  Since  the  step  command  is  still 
driving  the  FB  loop,  and  for  practical  reasons  we  ccr- 

is  noted  that  this  “tail”  problem  could  be  avoided,  if  the 
impulse  response  is  generated  by  a  parametric  model  fitted  to 
the  data. 
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tainly  do  not  want  the  temperature  to  respond  like  a 
step,  a  reference  model  is  introduced  to  generate  the 
desired  response.  A  first-order  reference  model  with 
rise  time  of  about  2  seconds  is  used  to  provide  the 
desired  iemperaiurc  response.  However,  since  we  use 
the  FB  output  as  an  indication  of  how  well  the  system 
performs,  a  desired  FB  signal  must  be  computed.  This 
desired  FB  signal  is  generated  by  feeding  the  difference 
between  the  step  command  and  the  desired  tempera¬ 
ture  response  to  a  copy  of  the  FB  controller.  This 
FB  controller's  output  is  then  the  desired  FB.  This  is 
because,  in  our  configuration,  though  the  desired  tem¬ 
perature  is  the  output  of  a  first-order  reference  model, 
the  (step)  command  is  still  driving  the  real-time  FB 
controller.  When  and  if  the  desired  temperature  tra¬ 
jectory  is  achieved,  the  FB  controller  would  see  an 
“ideal”  error  which  is  the  difference  between  the  com¬ 
mand  and  the  desired  temperature  trajectory. 

With  the  desired  and  the  actual  FB,  a  FB  output 
based  error  signal  is  obt^ned.  (This  error  signal  is 
time-advanced  according  to  Eq.  (11)  and  the  foot¬ 
note  comment  thereafter  to  account  for  the  process 
and  sample  delay.)  This  (time  advanced)  error  signal 
and  the  impulse  response  vector  were  then  used  by 
the  LC  algorithm  to  compute  a  FF  signal.  The  algo 
rithm  computes  an  appropriate  FF  signal  that  would 
minimize  this  error  according  to  the  impulse  response 
model.  For  unconstrained  situation,  this  minimization 
can  be  accomplished  by  exact  inversion.  However,  it  is 
known  that  exact  inversion  is  often  not  a  good  idea  in 
the  face  of  uncertainties,  a  principle  also  found  in  iter¬ 
ative  image  deblurring  [6]  and  adaptive  signal  process¬ 
ing  [7].  We  instead  use  soft  inversion  by  numerically 
iterating  with  Optimal  Gradient  Descent  [4,  1]. 

Thirty  (30)  numerical  iterations  are  performed  be¬ 
tween  process  repeats.  The  resultant  FF  signal  (shown 
in  Figure  6)  is  then  injected  with  proper  time  synchro¬ 
nization.  Notice  the  calculated  anticipatory  action  of 
the  FF  signal  that  precedes  the  step  command.  The 
result  is  a  significantly  faster  response  as  shown  in  Fig¬ 
ure  6.  The  (to  90%)  rise  time  is  about  three  (3)  times 
shorter  than  that  of  FB  only  control  and  yet  without 
overshoot.  However,  it  is  noticed  that  the  rise  time  is 
still  longer  than  the  desired,  and  the  wafer  tempera¬ 
ture  starts  the  transition  a  bit  too  early  (about  0.8 
second  or  so).  A  few  factors  could  have  contributed 
to  this,  including  measurement  and  numerical  errors 
and  perhaps  process  nonlinearity  that  would  affect  the 
fidelity  of  impulse  response  modeling. 

Using  the  same  FF  signal  but  delaying  it  by  0.5  sec¬ 
ond,  a  very  good  result  is  obtained  and  is  displayed 


in  Figure  7.  The  wafer  temperature  starts  the  upward 
transition  just  on  time  and  hits  the  target  in  less  than 
3  seconds  with  no  overshoot.  Comparing  to  the  25 
seconds  or  so  settling  time  with  FB  only  control,  this 
means  a  saving  of  more  than  20  seconds  for  just  one 
processing  step  which  is  quite  significant  for  RTF. 

Alternatively,  in  a  more  systematic  manner  as  the  it¬ 
erative  LC  algorithm  suggests,  the  error  signal  from 
the  the  first  try  can  be  used  by  the  LC  algorithm  to 
come  up  with  a  refined  FF  for  the  second  task  repeat. 
This  procedure  has  been  carried  out  numerically  and 
yielded  a  refined  FF  signal.  However,  due  to  a  real¬ 
time  implementation  error,  (which  biases  any  FF  sig¬ 
nal  by  its  initial  non-zero  values,)  the  refined  FF  is 
realized  with  a  step  bias  which,  of  course,  prevents  us 
from  getting  the  correct  result.  ®  However,  based  on 
qualitative  observation  and  quantitative  anlaysis  (that 
estimates  the  bias  effect),  the  refined  FF  generated  by 
the  LC  algorithm  would  have  worked  if  implemented 
correctly.  Future  experiments  are  needed  to  validate 
this. 

Finally,  “robustness”  of  the  combined  FF  and  FB 
scheme  is  revealed  in  the  following  two  cases.  If  the 
(first)  FF  is  delayed  by  1  second  (i.c.,  irvict  as  much 
as  the  0.5  second  delay),  the  wafer  temperature  is  still 
well  behaved  and  responds  fast,  but  with  about  V  C 
overshoot  (for  this  100"'  step).  This  provides  a  “feel” 
for  the  robustness  of  FF  control.  Another  robustness 
indicator  is  the  repeatability  run  which  was  performed 
three  days  after  the  first  run  was  made  and  showed  re¬ 
markable  consistency.  In  between,  the  re^tion  cham¬ 
ber  was  taken  apart  once  for  examination  and  that 
apparently  did  not  impact  the  performance.  The  rela¬ 
tively  tight  closed-loop  FB  control  should  be  (at  least 
partially)  credited  for  the  apparent  robustness.  This 
is  one  of  the  reasons  for  using  combined  FF  and  FB 
configuration. 

5  Summary 

Feedforward  learning  is  applied  to  control  the  wafer 
temperature  in  a  typical  RTF  reactor  in  a  rugged  in¬ 
dustrial  environment.  The  preliminary  experiments 
have  demonstrated  a  speed-up  by  a  factor  of  8,  or  20 
seconds  saving  in  one  processing  step,  quite  significant 
for  RTF. 

With  all  the  required  ingredients  readily  measurable, 
the  learning  scheme,  as  decribed  here  and  in  [1],  is 
suitable  for  on-site  tuning  and  learning.  The  very  en¬ 
couraging  experimental  results  seem  to  strongly  sup- 

‘This  is  not  »  problem  for  the  first  try,  because  the  first  FF 
signal  starts  with  values  very  close  to  zero. 


w«f«r Map  wiaptiaii 


Figure  6:  WaJer  Temperature  with  FF  Learning  -  1 


port  the  applicability  of  the  theory  and  deserve  serious 
considerations  for  practical  implementations. 

The  LC  scheme  discussed  so  far  is  signal  synthesis 
based,  i.e.,  appropriate  FF  signals  are  determined 
and  memorized  for  specific  tasks.  An  implementa¬ 
tion  method  for  practical  systems  is  to  use  dynamic 
FF  filters  driven  by  the  recipe  commands  to  generate 
the  appropriate  FF  signals,  thereby  effecting  general¬ 
ization  capability  and  memory  efficiency  (8,  9J.  It  is 
interesting  to  note  that  these  FF  filters  can  be  deter¬ 
mined  using  an  extension  of  the  LC  scheme  described 
herein. 
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ABSTRACT 

A  new  feedforward  learning  control  (LC)  scheme  is  re¬ 
ported  in  this  paper.  This  new  scheme  is  applicable  to 
a  class  of  smoothly  nonlinear  (repetitive)  control  tasks. 
Rapid  reductions  in  tracking  error  arc  demonstrated  us¬ 
ing  a  singl<sinput,  single-output  (SISO),  nonlinear  model 
of  the  Rapid  Thermal  Processing  (RTP)  wafer  manufac¬ 
turing  process. 

This  scheme  preserves  the  simplicity  of  the  basic  LC 
scheme  reported  in  [1]  by  using  the  (measurable)  impulse 
response  model.  Since  the  “impulse  response"  of  a  nonlin¬ 
ear  process  varies  as  the  process  trajectory  moves  along, 
adaptation  is  used  to  adjust  the  impulse  response  model 
between  task  repeats.  Analytical  justifications  for  the  pro¬ 
posed  adaptation  arc  provided  using  a  nonlinear  equation 
solving  analogy.  Extension  to  the  mult-input  and  muti- 
output  (MIMO)  case  is  also  induded. 

1  Introduction 

It  is  well  cstabUshed  that  feedback  (FB)  control  can  provide 
a  measure  of  robustness  against  process  variations  and  distur- 
The  speed  of  FB  control,  however,  must  be  balanced 
with  stability  consideraUons  which  are  often  limited  by  the  pro- 
cess  diaracteristics  such  as  time  delay.  A  properly  designed 
feedforward  (FF)  control,  on  the  other  hand,  can  complement 
FB  rontrol  by  promoting  non-delay  and  anticipatory  actions, 
leading  to  superior  tracking  or  disturbance  rejection.  Combin¬ 
ing  FB  and  FF,  one  could  have  a  robust,  sUble  and  yet  agile 
control  system. 


Processing  (RTP)  semiconductor  manufacturing  process  model 
this  new  adaptive  LC  scheme  shows  rapid  convergence  after  3 
adaptation  task  repeats.’ 


me  rest  of  the  paper  is  organixed  as  follows.  In  Section  2,  the 
basic  LC  scheme  and  iu  varianU  are  reviewed.  In  Section  3 
the  nonlinear  RTP  model  is  described  first,  and  then  the  basic 
LC  sdteme  is  applied.  It  is  shown  that  the  measured  “impulse 
r«poi^"  of  this  nonlinear  process  model  varies  considerably 
with  the  size  of  the  impulse  input.  Consequently,  the  basic  LC 
sc^me  which  nses  a  uncle  fixed  “impulse  response*  does  not 
perform  well.  In  fact,  one  of  the  impulse  responses  led  to  iter¬ 
ative  leaminc  divercence.  Section  4  first  shows  that  with  the 
new  adaptive  LC  scheme,  the  temperature  trajectory  tradcinr 
IS  (nearly)  p^ected  after  3  adaptation  task  repeats.  Then,  in 
Section  5,  this  adaptive  sdieme  is  formulated  using  a  pndie- 
tion  error  minimmation  view  for  the  SISO  case.  Minimizing  the 
prediction  error,  the  adapUtion  tdeme  a4justs  the  impulse  re- 
ponse  model  between  task  repeaU.  The  computation  algorithm 
IS  a  to  the  basic  LC,  thereby  allowing  software  reuse.  Sec¬ 
tion  6  offers  an  insightful  illustration  of  this  adapUve  LC  process 
^  using  a  “ID-  noidin^  algebraic  equation  solving  analogy. 
Tbw  illustraUon  linlcs  tbis  adaptive  process  to  successive  ap- 
prox^ation  methods  used  in  solving  nonlinear  equations  [2,  6] 
Section  7  extends  this  adaptive  LC  scheme  to  the  multi-input 
multi-output  (MIMO)  case. 


2  The  Basic  Learning  Algorithm  and 
Variants 


To  autocnatically  determine  the  appropriate  FF  actions,  a  ba- 
uc  Learning  Control  (LC)  scheme  and  iU  varianU  have  been 
reported  in  (l)  for  repetitive  control  tasks  including  many  man¬ 
ufacturing  type  processes.  Based  on  very  simple  and  practical 
ideas,  this  basic  LC  scheme  can  lead  to  rapid  learning  of  FF 
signals  for  (almost)  linear  time-invariant  (LTI)  processes,  SISO 
or  MIMO.  The  only  required  information  is  the  process  loop  im- 
pi^e  respo^  which  can  be  readily  measured  on  site,  making 
this  &  practical  on-site  tuning  tool. 

In  this  paper,  this  simple  and  effective  LC  scheme  is  extended 
to  the  control  of  a  class  of  nonlinear  processes.  Retaining  the 
simpUcity  of  “impulse  response"  modeUng,  this  scheme  uses 
between- task  adaptation  to  adjust  the  impulse  response  model 
to  successively  achieve  good  approximations  of  the  underlying 
nonlinear  process  dynamics.  An  interesting  i%al  relationship 
between  the  adaptation  mechanism  and  the  basic  LC  aUows 
the  use  of  the  same  least-squares  inversion  type  algorithm  for 
a  dual  purpose.  Demonstrated  on  a  nonlinear  Rapid  Thermal 

a.  AFOSR  Contract  No.  F-t9620. 


Co^ider  *  combined  FB  «id  FF  control  configuration  as  rfiown 
m  Figure  1.  The  b&sic  LC  scheme  and  its  variants  can  be  de¬ 
scribed  by: 


where  e(')  is  the  tracking  error  vector  *  and  /(')  the  FF  signal 
in  the  i-th  it«^on  «d  K  is  a  general  (linear)  operator  which 
faahtates  LC.  This  LC  operator,  K,  may  include  time-advance 
operations  to  allow  proper  creiil  sjiynment  for  the  learning 
process- 


Realizing  that  both  /  and  the  command  r  contribute  to  e, 
straightforward  analysis  yields  the  following  relationships  in 
noiseless  conditions: 


=  (I-  A<)/(<)  _  RTr 


(2) 

(3) 


in.  coditioiu.  the  signal  ti;s,  instead  of  the  track- 

ing  eiTor  ,  €,  jnay  be  used  for  itermtive  leaning,  where  u yv  is  the 
deviotionof  the  FB  controller  output  from  a  reference  uy/ 
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Figure  1:  Combined  Feedforwaxd  and  Feedback  Configu¬ 
ration 


transfer  operator  from  the  command  r  to  c: 


Mzz^(l^VCy^V  (4) 

r  =  /  -  (/+  7>C)-'PC  (5) 

Notice  that  if  K  is  chosen  as  the  inverse  of  i.c.,  the  transfer 
operator  from  /  to  e,  then  exact  convergence  tuilK  one  repeat  is 
posiihic.  This  is  because  if  K  ^ 

/(•■+»)  =  {I  -  M-'M) /<''>- M-^Vr  (6) 

=  P->r  (7) 

e(»+i)  _  0 


where  /  =  'P~*t  U  the  “i>erfect"  FF  control  signal  for  repro- 
dudng  the  command  trajectory  r.  ^ 


A  more  relaxed  condition  for  tracking  convergence  is  that 
l^i(/  “  MK\  <  1  or  0  <  Xi{MK)  <  2.  where  A,*  denotes  the 
eigenvalues,  and  matrix  Af  is  a  time-domain  (Toeplitx)  realiza¬ 
tion  of  the  transfer  operator.  At,  as  shown  in  Eq.  (10)  below. 
This  relaxed  condition  allows  the  use  of  a  broader  scope  of  al¬ 
gorithms  ranging  from  cases  where  A4~^  is  not  known  exactly 
(e.g.,  due  to  measurement  imprecision  or  process  nonlinearity) 
to  gradient  descent  type  of  optimization  ^gorithms.  Now,  in¬ 
stead  of  converging  in  one  repeat,  it  will  take  multiple  repeaU 
to  achieve  satisfactory  results. 


Computationally,  this  basic  algorithm  involves  the  following  (as¬ 
suming  the  closed-loop  FB  system  is  LTI): 


/(*+0 
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(10) 

(11) 

(12) 


usumc  the  existence  of  process  “right  inverse"  (3,  1).  A 
neccjjory  condition  is  that  the  number  of  process  inputs  must  be 
no  less  than  the  number  of  outputs.  With  the  existence  of  “P"*, 
the  existence  of  At”*  for  the  closed-loop  is  assured.  When  the 
riffht  inverse  does  not  exist,  least-squares  solutions  can  be  sought 
(IJ  according  to  the  computational  formulation  to  follow. 


Figure  2:  Step  Response  of  the  RTF  Model  under  FB-onJy 
Control 


where  /ij ,  Aj,  A3,  ...A//  is  the  closed-loop  impulse  response  se¬ 
quence  from  /  to  c  (which  can  be  readily  measured  on-site).  ^ 

*  Notice  that  the  above  formulaticm  is  equally  applicable  to 
SISO  and  MIMO  LTI  problems  as  the  simulation  results  in  (l] 
demonstrate. 

Variants  of  this  basic  computational  scheme  arc  described  in 
(1),  addressing  the  issues  of  process  invertibility,  phase  non- 
minimality.  process  and  measurement  uncertainties,  actuator 
constraints,  etc.  These  methods  arc  various  constrained  or  reg¬ 
ularized  least-squares  (LS)  type  solution  procedures  that  are 
considered  useful  and  robust  for  handling  these  practical  issxKs. 

3  The  Basic  LC  Applied  to  A  Nonlin¬ 
ear  RTF  Model 

In  this  section^  we  apply  the  basic  LC  scheme  which  is  de¬ 
signed  for  LTI  processes  to  a  mea/taear  single-input  single¬ 
output  (SISO)  RTF  process  model  which  is  under  stable,  jf&ia 
scheduled  “linear"  FB  control.  This  RTF  model  exhibits  a  *1  sec¬ 
ond  time  delay,  and  is  commanded  to  go  from  650®  C  to  750® 
C  "as  quickly  as  possible”  and  preferrably  with  ao  oversissl 
The  step  response  of  the  FB  loop  is  shown  in  Figure  2.  Ap¬ 
parently  that  to  avoid  overshoot,  the  FB  controller  takes  a  long 
time  to  eliminate  the  steady-state  error.  This  performance  b 
only  margmally  acceptable,  and  improvement  is  highly  desired. 
To  apply  the  basic  LC  to  this  nonlinear  PB-loop,  an  “impulse 
response"  measurement  has  to  be  made.  For  a  nonlinear  pro¬ 
cess,  the  notion  of  “impulse  response"  b  imprecise  and,  at  best, 
a  first-order  approxizaati<m  (in  the  sense  of  a  Volterra  series  ex- 
pansion  (4).)  It  b  expected  to  vary  as  a  function  of  the  process 
state  and  input.  The  degree  of  these  variations  depends  on  the 
degree  of  the  nonlinearity. 

In  this  study,  the  /  to  “impulse  response"  model  b  used. 

*  Figure  3  shows  three  estimated  “impulse  responses"  (corre¬ 
sponding  to  the  transfer  from  the  FF  signal  to  the  FB  controDer 
ouput).  The  first  two  arc  estimated  by  inputting  a  FF  step 
of  10%  and  30%  power  command,  respectively  into  the  model. 
Then,  the  “impube  responses"  arc  estimated  by  differencing  the 


^If  u/k  b  used  for  learning,  then  the  impulse  response  sequence 
should  be  from  /  to  uyi,. 

^Equations  (9)  and  (10)  have  assumed  that  the  only  time  delay 
is  the  sample  delay,  i.e.,  1  delay.  If  additional  process  delays  exist, 
some  of  the  leading  h's  would  be  zero  and  should  be  removed  from 
the  equation,  together  with  the  corresponding  leading  c’s. 

This  is  because  for  single-input  single-output  (linear)  processes, 
the  transfer  function  from  the  command  r  to  the  wafer  temper¬ 
ature  y  b  the  same  as  the  transfer  function  from  FF  (/)  to  FB 
(u/^)  with  an  exception  of  the  sign.  Knowing  the  r  to  y  tempera¬ 
ture  step  respond  from  the  FB-only  performance  (and  hoping  that 
the  nonlinearity  b  not  too  severe),  one  can  estimate  the  FF  to  FB 
impulse  response  without  having  to  conduct  special  system  ideatv- 
fication  experiments.  In  practical  applications,  these  savings  could 
be  meaningful. 


Figure  3:  Three  (3)  Estimated  “Impulse  Rcspoases”  for 
the  Nonlinear  RTF  Model 


Figure  4:  Basic  LC  Results  using  ^Impulse  Response* 
Model  -  hrioo 

•tep  responses.  The  third  ‘‘impulse  response,"  hrioo ,  w  inferred 
from  the  actual  temperature  response  to  the  temperature  step 
command  (650®  C  to  750®  C)  according  to  the  above  (footnote) 
discussion.  It  is  apparent  that  th^  three  "impulse  responses" 
exhibit  ciuite  different  gain  and  dynamic  characteristics.  Since 
the  third  “impulse  response"  is  the  one  that  in  practice  can  save 
a  system  identification  experiment,  we  use  it  in  the  simulation 
for  studying  the  basic  LC.  Figure  4  shows  the  results  after  one 
task  repeat  and  after  three  task  repeaU.  It  is  apparent  that  with 
this  particular  "iinpulsc  response,"  h^ioo,  the  basic  LC  seems 
to  diverge.  In  practical  applications,  however,  one  may  use  only 
e  fmciion  of  the  FF  signal  recommended  by  the  basic  LC  algo¬ 
rithm  to  increase  the  riiance  of  convergence.  In  fact,  with  this 
divergent  h^ioot  good  results  are  obtained  after  msay  itera¬ 
tions,  if  the  basic  LC  recommended  FF  signal  is  discounted  by 
50%  at  ea^  iteration.  Certainly,  one  could  ask  questions  such 
as:  what  is  the  minimum  amount  of  discoimt  that  would  still 
guarantee  convergence,  etc.  The  rent  however,  is  that  the 
many  iterations  means  potentially  many  calibration  runs  which 
would  make  the  method  less  attractive.  Instead,  using  the  pro- 
pos^  new  adaptive  learning  FF  control  method,  in  the  next 
section  we  present  some  very  attractive  results  with  convergence 
virtually  achieved  after  two  or  three  adaptations. 

4  Adaptive  LC  Applied  to  the  Nonlin- 
ear  RTP  Model 

Using  the  adaptation  scheme  to  be  described  in  the  next  section, 
rapid  convergence  is  demonstrated  here  with  the  nonlinear  RTF 
model.  Figure  5  (upper)  shows  the  results  after  4  task  rcpcaU 
(or  3  adaptations).  »  The  FB-only  response  is  also  shown  in  the 
upper  plot  to  contrast  the  performance  improvement  effected 

*Since  learning  is  “from  scratch,"  i.c.,  the  initial  FF  is  zero  no 
adaptation  takes  place  until  after  the  first  task  repeat  and  therefore 
there  are  only  3  adaptations  in  4  task  repeats.  (Since  learning  has 
stopped  after  4  rei>eats,  no  adaptation  is  made  after  the  4th  repeat 
either.) 


Figure  S:  Adaptive  Leuaing  Control  Result:  nonline&r 
RTP  model  (after  4  task  repeats) 


Fi^re  6:  Impulse  Response  Models:  before  and  after  adap- 
tation  ^ 


by  ^e  adaptive  LC.  The  temperature  response  has  virtually 
confonned  to  the  duind  rttponte  whidx  U  that  of  a  fiTst.order 
system  with  a  0.5  second  time  constant.  The  desired  response 
IS  dispUyed  m  the  lower  plot  for  the  ease  of  comparison. 


•  •  1  ss^^  .  '  ' - '' - response  witn 

the  original  impulse  response."  hnoo.  which  is  used  prior  to 
adaptation  by  the  basic  LC  algorithm  to  perform  the  first  task 
rep^t.  Figure  7  shows  the  sequence  of  intermediate  results 
with  0.  1.  2  and  3  a^ptations  (or  equivaleaUy.  with  1.  2,  3  and 
4  repots).  It  IS  noticed  that  the  learning  process  has  converged 
rapidly  after  2  adapUtions.  With  this  kind  of  convergence,  this 
technique  can  be  very  useful  in  practical  manufacturing  pro- 
^es  that  exWbit  a  considerable  degree  of  nonlinearity  as  this 
RTP  model.  By  investing  in  a  few  calibration  runs,  one  may 
gam  substantial  improvement  over  many  production  runs.  Once 
a  near  optimal  FF  signal  is  esUbUshed,  this  tedmique  can  also 
be  used  for  fine  tuning  as  the  equipment  drifts  over  periods  of 
time. 


Before  discusring  the  detriU  of  this  new  adaptation  scheme  in 
the  next  ^ction,  a  ease  subject  to  periodic  measurement  dis- 
t^bance  (with  random  phase)  is  presented  here  to  demonstrate 
he  algorithm  s  robustness  in  a  noisy  environment.  Figure  8 
shows  the  ^iJts  after  4  repetitive  appUcations  of  the  adaptive 
LC  algorithm  (i.e.,  3  adaptations).  The  learned  response  is  plot- 
ted  against  the  (step)  command,  the  response  of  the  reference 
T"**!;  ‘  u*  response  subject  to  the  same  random- 

pha^d  distuAance  condition.  Appwently.  the  learned  response 
tra^  the  reference  response  quite  well  in  the  face  of  the  dis- 
urbance.  It  is  noticed  that  the  adaptive  LC  does  not  seem 
to  aggravate  the  disturbances,  and  the  steady-state  disturbance 
level  remains  about  the  same  as  that  in  the  FB-only  case.  Note 
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Figure  7;  Adaptive  Learning  Control  Results:  the  learning 
sequence 
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Figure  8:  Adaptive  Learning  Control;  a  noisy  case 


that  in  this  simulation,  no  special  filtering  is  used.  The  robxist 
performance  is  a  result  of  1)  adapting  only  the  early  portion 
of  the  impulse  response  model,  ^  and  7)  using  “soft”  inversion 
instead  of  “exact”  inversion  for  LC  computations  between  task 
repeats  to  avoid  amplifying  noise.  *  In  practical  applications, 
proper  filtering  (l,  7]  of  the  signals  would  add  extra  protection 
against  disturbances  and  noise.  * 


5  The  New  Adaptation  Scheme  -  SISO 
Case 

Since  the  notion  of  “Impulse  response”  for  a  nonlinear  process 
is  imprecise,  the  model  would  vary  as  a  function  of  the  pro¬ 
cess  state  and  input.  The  idea  then  is  to  Ada.piively  adjust  the 
impulse  response  model  as  the  learning  process  proceeds.  The 
basis  for  adaptation  is  the  mismatch  between  what  is  expected 
and  what  one  actually  gets.  For  the  ease  of  presentation,  we 
focus  on  single-input  single-output  (SISO)  case  in  this  section. 
Extension  to  the  MIMO  case  is  considered  in  Section  7. 

^  illustrates  this  yvediciion  error  hased  adaptation 
scheme.  After  the  »-th  task  repeat,  the  basic  LC  scheme  is  used 
to  dctcmunc  the  FF  increment.  for  the  next  repeat  based 

on  the  heii  impulse  respKinse  model,  M'!,  that  one  has  at  the 
time.  Applying  this  FF  increment,  to  the  process,  one  can 
expect  (pre<iicO  *  new  (tracking)  error  according  to  the  current 


^This  is  a  simple  robustiflcation  procedure  discussed  in  the  next 
Section. 

*Also,  see  Section  5  for  more  details  on  this. 

In  (7],  expenmental  work  of  applying  LC  to  wafer  temperature 
control  in  a  typical  FtTP  reactor  is  discussed.  Considerable  sensor 
disturbance  are  encountered  and  successfully  removed.  The  oper¬ 
ation  condition,  however,  is  linear  enough  that  the  basic  LC  proves 
to  be  quite  adequate. 


Figure  9:  Prediction  Error  based  Adaptive  LC  Scheme 
impulse  response  mode): 

£(•+»  =  +  (j3) 

where  is  the  actual  measured  (tracking)  error  (i.e.,  u,. 
m  our  case)  from  the  i-th  task  repeat,  and  [h)  stands  for  the 
Tocplitz  matrix  form  of  A  as  shown  in  Equation  (10)  which  actu- 
^y  p^orms  conso/afion  between  h  and  df.  Before  convergence 
is  aeWeved,  the  expected  error  for  the  (» -H)-th  task  repeat  will 
be  dififerent  fix>in  the  actual  observed  error  due  (primarily)  to 
the  errors  associated  with  the  impulse  response  model.  This 
difference,  termed  prediction  error,  is  defined  as: 


(14) 


Substituting  for 

(15) 

=  £(-+‘)  -  £<•)  -  [d/(')]A(i)  (16) 

where  we  have  used  the  fact  that  the  convolution  between  h  and 
d/  IS  conunutativc  to  arrive  at  the  second  equality.  This  is  a  Icy 
tcchnicAl  step  leading  to  the  following  adaptation  algorithm; 
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(18) 


(19) 


and  adapt  k  according  to: 

+  dk 

Solving  this  matrix  equation,  Eq,  (18),  can  be  approached  by 
ex«t  inversion,  constrained  least- squares,  or  “soft"  inversion 
using  a  finite  number  of  »«m erica/  iterations  with  optimal  gra¬ 
dient  descent,  etc.  optimal  gradient  descent,  the  gradient 

(with  respect  to  h)  is  readily  expressed  as: 


2  aA(*) 


(20) 

(21) 


'®The  detrils  arc  described  in  our  previous  paper  (l|.  (In  the  face 
of  uncertainties,  soft  inversion  is  often  preferred  to  exact  inversion  - 
a  principle  also  found  in  iterative  image  deblurring  [8]  and  adaptive 
signal  processing  |9|.) 


Hcncc,  the  Optima]  Gradient  algorithm  for  the  adaptation  of  h 
U  (S): 


A^')  — 

(27) 

(23) 

(9'S)/[9W'^]W%) 

(24) 

NumericAlly  iterating  with  the  Optimal  Gradient  Algorithm 
finitely  many  times  performs  soft  inversion  for  Eiq.  (18)  at  each 
task  repeat. 

Notice  that  this  matrix  e<iuation  for  h  adaptation  is  exactly 
the  d«a/  to  the  basic  LC  algorithm  which  solves  for  df  (£q. 
(10)).  The  roles  of  dh  and  df  are  exchanged  in  these  two  d%&l 
frehtemt.  This  allows  the  reuse  of  the  softweire  developed  for  the 
basic  LrC  in  the  adaptation  of  A.  It  also  allows  the  variant  LC 
algorithms  (l]  to  be  \ised  in  h  adaptation.  For  instance,  the  rate- 
constrained  LC  algorithm  might  be  useful  in  the  adaptation  of 
A  to  discourage  measurement  noise  fxom  entering  the  adapted  A. 
However,  this  rate  constraining  should  be  used  primarily  on  the 
"tail”  portion  of  the  mpulse  response  model,  A.  This  is  because 
the  early  portion  of  A  has  high  signal- to-noke  ratio  (SNR)  and 
usually  exhibits  fast  transient  response  due  to  the  dominant 
process  dynamics,  and  therefore  shoxdd  not  be  penalized  for  rate 
of  change. 

Implemented  in  a  simplified  manner,  one  may  choose  to  adapt 
only  the  early  portion  of  the  h  model  and  leave  the  rest  intact. 
This  way,  the  potential  contamination  by  measurement  noise 
is  minimized  (provided  that  a  reasonable  *'tail"  portion  of  the 
initial  k  is  available).  For  example,  if  only  the  first  k  terms  of  A 
are  adapted,  the  adaptation  matrix  equation  becomes; 
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(25) 

This  s^ple  schezne  actually  works  <^te  effectively  with  noise 
according  to  the  simulated  resvilts.  It  also  seems  to  rohusiify 
the  adaptation  scheme  even  in  a  noiseless  environment.  This  is 
because  sometimes  "tail  swings"  may  occur  in  the  intermediate 
runs  before  convergence  is  achieved. _  Being  able  to  hold  fast 
onto  the  steady-state  portion  of  the  A  model  would  encourage 
consistently  good  steady-state  convergence. 

6  Justifications  of  the  Adaptation 
Scheme 

In  this  Section,  we  offer  a  simplified  but  insightful  illustration  of 
the  workings  of  this  adaptation  scheme.  By  reducing  the  dimen¬ 
sion  of  the  A  vector  to  1 ,  the  problem  becomes  that  of  solving  a 
single- variable  nonlinear  algebraic  equation.  In  that  context,  it 
then  becomes  clear  that  the  proposed  adaptation  is  the  secant 
method  [6].  The  secant  method  is  an  effective  method  for  solv¬ 
ing  nonlinear  algebraic  equations  and  is  convergent  if  the  initial 
trial  solution  is  close  enough  to  the  true  solution.  Visualizing 
the  Iterative  solution  process  provides  deeper  understanding  of 
this  adaptation  scheme  and  helps  sketch  out  possible  formed 


Figure  10:  A  Nonlinear  Algebraic  Ex^uation  Solving  Anal¬ 
ogy 

proofs  of  convergence  by  qualifying  the  underlying  nonlinearity. 
Extending  from  the  nonlinear  algebraic  analogy  to  a  dynamic 
setting  can  be  accomplished  using  the  concepts  from  operator 
theory  and  functional  analysis  [2]. 

Ck)nsidering  a  one-dimensional  case,  Figure  10  depicts  a  prob¬ 
lem  of  finding  the  right  input,  x*,  that  will  produce  the  desired 
outcome,  y*,  on  a  nonlinear  curve.  We  decide  to  use  linear 
model  templates  to  successively  approadi  the  right  solution.  To 
get  started,  we  conduct  a  “system  identification”  experiment  by 
putting  in  input,  xq,  and  ol»ervmg  the  corresponding  y  value 
on  the  nonlinear  curve,  i.e.,  yu.  Based  on  this  experiment,  we 
build  the  first  linear  model,  A^®),  which  is  the  equation  describ¬ 
ing  a  line  passing  through  the  origin  with  slope  of  yo/xo.  Using 
this  initial  model,  we  now  proceed  to  find  x*  by  determining 
^  value  would  give  y*  as  output.  By  solving  the  equation, 
M®)(x)  r=  y*,  it  is  determined  that  xj  would  be  the  “right" 
X  value.  However,  the  nonlinear  curve  responds  to  xj  with  a 
^ue,  yj  that  is  quite  different  from  the  desired  y*  as  IHgure  11 
illustrates.  The  difference  between  the  actual,  yi ,  and  the  ex¬ 
pected,  y* ,  is  used  (together  with  the  amount  of  x  incremented, 
Le.,  xi )  to  determine  the  modification  that  should  be  applied  to 
A^®).  After  this  modification  (adaptation),  the  updated  linear 
model,  is  the  line  passing  through  the  oripn  and  the  point, 
Extending  this  line  to  find  the  value  of  x  that  would 
produce  the  desired  y",  we  obtain  xj. 

However,  instead  of  reaching  the  desired  y*  we  obtain  the  value 
y2  by  applying  xj  to  the  nonlinear  cxirve.  Notice  that  ya  is  still 
considerably  less  than  the  desired  y*.  The  difference  between 
ya  and  y*  is  the  prediction  error  which  Is  used  (together  with 
the  amount  of  x  incremented,  i.e.,  xa  -  xi)  to  determine  the 
adapUtion  needed  for  A^^>.  The  updated  linear  model.  A^^),  is 
the  line  passing  through  the  poinU,  (x j ,  yj )  and  (xa .  ya ) .  Using 
this  updated  linear  model,  it  is  recommended  that  X3  be  used 
to  reach  the  desired  goal. 

Repeating  this  prediction  and  correction  cycle  one  more  time, 
we  have  another  new  linear  model,  ,  which  is  the  line  passing 

through  points  (xa ,  yj  )  and  (xj ,  ya ).  Now  this  new  linear  model, 
recommends  x*  as  the  input  for  achieving  our  goal.  This 
time,  the  recommendation  is  very  close  to  being  optimal,  (since 
the  result  of  applying  X4  to  the  nonlinear  curve  is  very  close 
to  the  desired  response,  y* ,)  and  the  process  has  practically 
converged. 

It  is  noted  that  the  above  procedure  is  identical  to  the  so-called 
secant  method  in  algebraic  equation  solving  [6].  The  secant 
method  is  a  locally  convergent  method  with  a  respectable  or¬ 
der  of  1.62  convergence  rate.  The  fact  that  the  secant  method 
does  not  require  the  evaluation  of  any  derivaiioes  (i.e.,  gradi- 
enU)  is  of  special  advantage  here,  because  it  translates  to  the 
adaptive  LC  scheme  not  requiring  any  special  jyitem  identifiea- 
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Figure  11:  The  Nonlinear  Algebraic  Equation  Solving 
Analogy  (in  action) 

iion  txpcrimcnis  except  for  the  first  run. 


7  The  New  Adaptation  Scheme  - 
MIMO  Case 

In  this  section,  we  extend  the  SISO  adaptation  algorithm  to 
the  mxUti-input  multi-output  (MIMO)  case.  For  MIMO  cases, 
there  are  some  added  technical  complexities  that  make  it  more 
challenging  and  illuminating. 

First  of  all,  the  key  iecKnical  sicy  in  the  development  of  SISO 
adaptation,  i.e.,  from  Eq.  (15)  to  (16),  no  longer  holds.  This  is 
because  the  MIMO  nature  makes  h  a  matrix  and  df  a  composite 
“long"  vector  which  contains  segments  of  multiple- input  df*s. 
“Commutation"  is  no  longer  as  straight  forward  as  it  is  in  ^s. 
(15)  and  (16).  Instead,  the  following  equation  (i.e.,  essentially 
Eqs.  (15)  and  (18)  combined) 
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Let  US  use  a  2  x  2  MIMO  process  for  illustration.  Notice  that 
originally  e*  is  a  2  x  1  column  vector  corresponding  to  the  2 
output  “prediction"  errors  at  time  k.  Now,  since  is  a  1  x 
2  row  vector,  the  left  hand  side  of  Eq.  (27)  has  become  an 
N  X  7  matrix  with  the  first  column  denoting  the  first  output 
“prediction"  errors  and  the  second  column  the  second  output 
“prediction"  errors.  Likewise,  the  second  entity  on  the  right- 
hand-side  of  the  equation  has  become  an  7N  x  2  matrix.  The 
first  column  corresponds  to  the  impulse  response  models  of  the 
first  output  (due  to  the  2  inputs).  These  observations  suggest 
that  the  MIMO  adaptation  could  be  performed  one  output  at  a 
time.  That  is,  the  above  matrix  equation  can  be  split  into  2  sets 


of  [N  X  l]  [N  X  2N][2/V  x  1]  vector-matrix  equations  which 
can  be  (readily)  solved  using  the  SISO  adaptation  procedure. 

However,  noting  that  there  are  7N  A’s  to  adapt,  whereas  there 
arc  only  N  equations,  this  split  formulation  seems  underdeter- 
mined.  (This  is  partly  due  to  the  redundancy  in  impulse  re¬ 
sponse  modeling.)  Fortunately,  this  problem  can  be  overcome 
by  invoking  the  Tobust  SISO  adaptation  scheme  that  only  a  por¬ 
tion  of  the  h  model  needs  adaptation.  Therefore,  this  MIMO 
adaptation  problem  can  be  readily  solved  one  output  at  a  time 
using  the  robust  SISO  adaptation  scheme. 

One  more  technical  detail  is  that  even  with  more  equations 
than  unkno^s  (by  adapting  only  a  portion  of  A),  this  problem 
can  still  be  ill-conditioned.  This  happens  when  the  columns  of, 
f*y.  [<yi»d/2.<f/3,....d/jv]'  are  (nearly)  identical  thereby  render¬ 
ing  the  N  X  7N  matrix  in  Eq.  (29)  (nearly)  singular.  Physically, 
this  means  when  the  df's  for  the  2  inputs  are  nearly  the  same, 
the  multi-inpui  aspect  of  the  MIMO  process  is  not  adequately 
probed  to  aUow  reliable  identification  of  individual  input  con¬ 
tributions.  This  phenomenon,  called  muHucollineariiy  [lO],  in 
multi- variable  regression  is  known  to  lead  to  erroneous  models. 
In  practice,  when  the  process  does  not  respond  to  the  2  inputs 
equally,  this  may  not  be  a  problem.  This  is  because  then  the 
2  d/’s  are  likely  to  be  different.  On  the  other  hand,  if  the  pro¬ 
cess  is  f^Iy  input  symmetric,  it  may  be  necessary  to  perturb 
the  2  d/’s  so  that  adequate  excitations  are  av^able  for  reliable 
adaptation. 

8  Summary 

A  new  adaptive  learning  feedforward  control  scheme  is  intro¬ 
duced  in  this  paper.  This  new  scheme  retains  all  the  key  features 
of  the  previous  basic  LC  scheme  [1],  such  as  modeling  simplicity 
and  applicability  with  or  without  FB  control,  SISO  or  MIMO. 
Above  and  beyond,  this  new  scheme  extends  the  applicability  of 
to  include  a  class  of  smoothly  nonlinear  processes  in  an  effec¬ 
tive  way.  Rapid  reductions  in  tracking  errors  arc  demonstrated 
using  a  nonlinear  RTF  manufacturing  model.  Computational 
algorithms  arc  given  along  with  rob\istneas  considerations.  An¬ 
alytical  jxistifications  are  given  using  an  analogy  with  the  secand 
method  used  in  algebraic  equation  solving. 
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